SQLAlchemy 0.9 Documentation

Release: 0.9.4 | Release Date: March 28, 2014 | Download PDF
SQLAlchemy 0.9 Documentation » SQLAlchemy Core » SQL Expression Language Tutorial

SQL Expression Language Tutorial

SQL Expression Language Tutorial

The SQLAlchemy Expression Language presents a system of representing relational database structures and expressions using Python constructs. These constructs are modeled to resemble those of the underlying database as closely as possible, while providing a modicum of abstraction of the various implementation differences between database backends. While the constructs attempt to represent equivalent concepts between backends with consistent structures, they do not conceal useful concepts that are unique to particular subsets of backends. The Expression Language therefore presents a method of writing backend-neutral SQL expressions, but does not attempt to enforce that expressions are backend-neutral.

The Expression Language is in contrast to the Object Relational Mapper, which is a distinct API that builds on top of the Expression Language. Whereas the ORM, introduced in Object Relational Tutorial, presents a high level and abstracted pattern of usage, which itself is an example of applied usage of the Expression Language, the Expression Language presents a system of representing the primitive constructs of the relational database directly without opinion.

While there is overlap among the usage patterns of the ORM and the Expression Language, the similarities are more superficial than they may at first appear. One approaches the structure and content of data from the perspective of a user-defined domain model which is transparently persisted and refreshed from its underlying storage model. The other approaches it from the perspective of literal schema and SQL expression representations which are explicitly composed into messages consumed individually by the database.

A successful application may be constructed using the Expression Language exclusively, though the application will need to define its own system of translating application concepts into individual database messages and from individual database result sets. Alternatively, an application constructed with the ORM may, in advanced scenarios, make occasional usage of the Expression Language directly in certain areas where specific database interactions are required.

The following tutorial is in doctest format, meaning each >>> line represents something you can type at a Python command prompt, and the following text represents the expected return value. The tutorial has no prerequisites.

Version Check

A quick check to verify that we are on at least version 0.9 of SQLAlchemy:

>>> import sqlalchemy
>>> sqlalchemy.__version__ 
0.9.0

Connecting

For this tutorial we will use an in-memory-only SQLite database. This is an easy way to test things without needing to have an actual database defined anywhere. To connect we use create_engine():

>>> from sqlalchemy import create_engine
>>> engine = create_engine('sqlite:///:memory:', echo=True)

The echo flag is a shortcut to setting up SQLAlchemy logging, which is accomplished via Python’s standard logging module. With it enabled, we’ll see all the generated SQL produced. If you are working through this tutorial and want less output generated, set it to False. This tutorial will format the SQL behind a popup window so it doesn’t get in our way; just click the “SQL” links to see what’s being generated.

The return value of create_engine() is an instance of Engine, and it represents the core interface to the database, adapted through a dialect that handles the details of the database and DBAPI in use. In this case the SQLite dialect will interpret instructions to the Python built-in sqlite3 module.

The first time a method like Engine.execute() or Engine.connect() is called, the Engine establishes a real DBAPI connection to the database, which is then used to emit the SQL.

See also

Database Urls - includes examples of create_engine() connecting to several kinds of databases with links to more information.

Define and Create Tables

The SQL Expression Language constructs its expressions in most cases against table columns. In SQLAlchemy, a column is most often represented by an object called Column, and in all cases a Column is associated with a Table. A collection of Table objects and their associated child objects is referred to as database metadata. In this tutorial we will explicitly lay out several Table objects, but note that SA can also “import” whole sets of Table objects automatically from an existing database (this process is called table reflection).

We define our tables all within a catalog called MetaData, using the Table construct, which resembles regular SQL CREATE TABLE statements. We’ll make two tables, one of which represents “users” in an application, and another which represents zero or more “email addreses” for each row in the “users” table:

>>> from sqlalchemy import Table, Column, Integer, String, MetaData, ForeignKey
>>> metadata = MetaData()
>>> users = Table('users', metadata,
...     Column('id', Integer, primary_key=True),
...     Column('name', String),
...     Column('fullname', String),
... )

>>> addresses = Table('addresses', metadata,
...   Column('id', Integer, primary_key=True),
...   Column('user_id', None, ForeignKey('users.id')),
...   Column('email_address', String, nullable=False)
...  )

All about how to define Table objects, as well as how to create them from an existing database automatically, is described in Describing Databases with MetaData.

Next, to tell the MetaData we’d actually like to create our selection of tables for real inside the SQLite database, we use create_all(), passing it the engine instance which points to our database. This will check for the presence of each table first before creating, so it’s safe to call multiple times:

sql>>> metadata.create_all(engine) 

Note

Users familiar with the syntax of CREATE TABLE may notice that the VARCHAR columns were generated without a length; on SQLite and Postgresql, this is a valid datatype, but on others, it’s not allowed. So if running this tutorial on one of those databases, and you wish to use SQLAlchemy to issue CREATE TABLE, a “length” may be provided to the String type as below:

Column('name', String(50))

The length field on String, as well as similar precision/scale fields available on Integer, Numeric, etc. are not referenced by SQLAlchemy other than when creating tables.

Additionally, Firebird and Oracle require sequences to generate new primary key identifiers, and SQLAlchemy doesn’t generate or assume these without being instructed. For that, you use the Sequence construct:

from sqlalchemy import Sequence
Column('id', Integer, Sequence('user_id_seq'), primary_key=True)

A full, foolproof Table is therefore:

users = Table('users', metadata,
   Column('id', Integer, Sequence('user_id_seq'), primary_key=True),
   Column('name', String(50)),
   Column('fullname', String(50)),
   Column('password', String(12))
)

We include this more verbose Table construct separately to highlight the difference between a minimal construct geared primarily towards in-Python usage only, versus one that will be used to emit CREATE TABLE statements on a particular set of backends with more stringent requirements.

Insert Expressions

The first SQL expression we’ll create is the Insert construct, which represents an INSERT statement. This is typically created relative to its target table:

>>> ins = users.insert()

To see a sample of the SQL this construct produces, use the str() function:

>>> str(ins)
'INSERT INTO users (id, name, fullname) VALUES (:id, :name, :fullname)'

Notice above that the INSERT statement names every column in the users table. This can be limited by using the values() method, which establishes the VALUES clause of the INSERT explicitly:

>>> ins = users.insert().values(name='jack', fullname='Jack Jones')
>>> str(ins)
'INSERT INTO users (name, fullname) VALUES (:name, :fullname)'

Above, while the values method limited the VALUES clause to just two columns, the actual data we placed in values didn’t get rendered into the string; instead we got named bind parameters. As it turns out, our data is stored within our Insert construct, but it typically only comes out when the statement is actually executed; since the data consists of literal values, SQLAlchemy automatically generates bind parameters for them. We can peek at this data for now by looking at the compiled form of the statement:

>>> ins.compile().params 
{'fullname': 'Jack Jones', 'name': 'jack'}

Executing

The interesting part of an Insert is executing it. In this tutorial, we will generally focus on the most explicit method of executing a SQL construct, and later touch upon some “shortcut” ways to do it. The engine object we created is a repository for database connections capable of issuing SQL to the database. To acquire a connection, we use the connect() method:

>>> conn = engine.connect()
>>> conn 
<sqlalchemy.engine.base.Connection object at 0x...>

The Connection object represents an actively checked out DBAPI connection resource. Lets feed it our Insert object and see what happens:

>>> result = conn.execute(ins)
INSERT INTO users (name, fullname) VALUES (?, ?) ('jack', 'Jack Jones') COMMIT

So the INSERT statement was now issued to the database. Although we got positional “qmark” bind parameters instead of “named” bind parameters in the output. How come ? Because when executed, the Connection used the SQLite dialect to help generate the statement; when we use the str() function, the statement isn’t aware of this dialect, and falls back onto a default which uses named parameters. We can view this manually as follows:

>>> ins.bind = engine
>>> str(ins)
'INSERT INTO users (name, fullname) VALUES (?, ?)'

What about the result variable we got when we called execute() ? As the SQLAlchemy Connection object references a DBAPI connection, the result, known as a ResultProxy object, is analogous to the DBAPI cursor object. In the case of an INSERT, we can get important information from it, such as the primary key values which were generated from our statement:

>>> result.inserted_primary_key
[1]

The value of 1 was automatically generated by SQLite, but only because we did not specify the id column in our Insert statement; otherwise, our explicit value would have been used. In either case, SQLAlchemy always knows how to get at a newly generated primary key value, even though the method of generating them is different across different databases; each database’s Dialect knows the specific steps needed to determine the correct value (or values; note that inserted_primary_key returns a list so that it supports composite primary keys).

Executing Multiple Statements

Our insert example above was intentionally a little drawn out to show some various behaviors of expression language constructs. In the usual case, an Insert statement is usually compiled against the parameters sent to the execute() method on Connection, so that there’s no need to use the values keyword with Insert. Lets create a generic Insert statement again and use it in the “normal” way:

>>> ins = users.insert()
>>> conn.execute(ins, id=2, name='wendy', fullname='Wendy Williams') 
INSERT INTO users (id, name, fullname) VALUES (?, ?, ?) (2, 'wendy', 'Wendy Williams') COMMIT
<sqlalchemy.engine.result.ResultProxy object at 0x...>

Above, because we specified all three columns in the execute() method, the compiled Insert included all three columns. The Insert statement is compiled at execution time based on the parameters we specified; if we specified fewer parameters, the Insert would have fewer entries in its VALUES clause.

To issue many inserts using DBAPI’s executemany() method, we can send in a list of dictionaries each containing a distinct set of parameters to be inserted, as we do here to add some email addresses:

>>> conn.execute(addresses.insert(), [ 
...    {'user_id': 1, 'email_address' : 'jack@yahoo.com'},
...    {'user_id': 1, 'email_address' : 'jack@msn.com'},
...    {'user_id': 2, 'email_address' : 'www@www.org'},
...    {'user_id': 2, 'email_address' : 'wendy@aol.com'},
... ])
INSERT INTO addresses (user_id, email_address) VALUES (?, ?) ((1, 'jack@yahoo.com'), (1, 'jack@msn.com'), (2, 'www@www.org'), (2, 'wendy@aol.com')) COMMIT
<sqlalchemy.engine.result.ResultProxy object at 0x...>

Above, we again relied upon SQLite’s automatic generation of primary key identifiers for each addresses row.

When executing multiple sets of parameters, each dictionary must have the same set of keys; i.e. you cant have fewer keys in some dictionaries than others. This is because the Insert statement is compiled against the first dictionary in the list, and it’s assumed that all subsequent argument dictionaries are compatible with that statement.

Selecting

We began with inserts just so that our test database had some data in it. The more interesting part of the data is selecting it ! We’ll cover UPDATE and DELETE statements later. The primary construct used to generate SELECT statements is the select() function:

>>> from sqlalchemy.sql import select
>>> s = select([users])
>>> result = conn.execute(s)  
SELECT users.id, users.name, users.fullname FROM users ()

Above, we issued a basic select() call, placing the users table within the COLUMNS clause of the select, and then executing. SQLAlchemy expanded the users table into the set of each of its columns, and also generated a FROM clause for us. The result returned is again a ResultProxy object, which acts much like a DBAPI cursor, including methods such as fetchone() and fetchall(). The easiest way to get rows from it is to just iterate:

>>> for row in result:
...     print row
(1, u'jack', u'Jack Jones')
(2, u'wendy', u'Wendy Williams')

Above, we see that printing each row produces a simple tuple-like result. We have more options at accessing the data in each row. One very common way is through dictionary access, using the string names of columns:

sql>>> result = conn.execute(s)  
>>> row = result.fetchone()
>>> print "name:", row['name'], "; fullname:", row['fullname']
name: jack ; fullname: Jack Jones

Integer indexes work as well:

>>> row = result.fetchone()
>>> print "name:", row[1], "; fullname:", row[2]
name: wendy ; fullname: Wendy Williams

But another way, whose usefulness will become apparent later on, is to use the Column objects directly as keys:

sql>>> for row in conn.execute(s):  
...     print "name:", row[users.c.name], "; fullname:", row[users.c.fullname]
name: jack ; fullname: Jack Jones
name: wendy ; fullname: Wendy Williams

Result sets which have pending rows remaining should be explicitly closed before discarding. While the cursor and connection resources referenced by the ResultProxy will be respectively closed and returned to the connection pool when the object is garbage collected, it’s better to make it explicit as some database APIs are very picky about such things:

>>> result.close()

If we’d like to more carefully control the columns which are placed in the COLUMNS clause of the select, we reference individual Column objects from our Table. These are available as named attributes off the c attribute of the Table object:

>>> s = select([users.c.name, users.c.fullname])
sql>>> result = conn.execute(s)  
>>> for row in result:  
...     print row
(u'jack', u'Jack Jones')
(u'wendy', u'Wendy Williams')

Lets observe something interesting about the FROM clause. Whereas the generated statement contains two distinct sections, a “SELECT columns” part and a “FROM table” part, our select() construct only has a list containing columns. How does this work ? Let’s try putting two tables into our select() statement:

sql>>> for row in conn.execute(select([users, addresses])):
...     print row  
(1, u'jack', u'Jack Jones', 1, 1, u'jack@yahoo.com')
(1, u'jack', u'Jack Jones', 2, 1, u'jack@msn.com')
(1, u'jack', u'Jack Jones', 3, 2, u'www@www.org')
(1, u'jack', u'Jack Jones', 4, 2, u'wendy@aol.com')
(2, u'wendy', u'Wendy Williams', 1, 1, u'jack@yahoo.com')
(2, u'wendy', u'Wendy Williams', 2, 1, u'jack@msn.com')
(2, u'wendy', u'Wendy Williams', 3, 2, u'www@www.org')
(2, u'wendy', u'Wendy Williams', 4, 2, u'wendy@aol.com')

It placed both tables into the FROM clause. But also, it made a real mess. Those who are familiar with SQL joins know that this is a Cartesian product; each row from the users table is produced against each row from the addresses table. So to put some sanity into this statement, we need a WHERE clause. We do that using Select.where():

>>> s = select([users, addresses]).where(users.c.id == addresses.c.user_id)
sql>>> for row in conn.execute(s):
...     print row  
(1, u'jack', u'Jack Jones', 1, 1, u'jack@yahoo.com')
(1, u'jack', u'Jack Jones', 2, 1, u'jack@msn.com')
(2, u'wendy', u'Wendy Williams', 3, 2, u'www@www.org')
(2, u'wendy', u'Wendy Williams', 4, 2, u'wendy@aol.com')

So that looks a lot better, we added an expression to our select() which had the effect of adding WHERE users.id = addresses.user_id to our statement, and our results were managed down so that the join of users and addresses rows made sense. But let’s look at that expression? It’s using just a Python equality operator between two different Column objects. It should be clear that something is up. Saying 1 == 1 produces True, and 1 == 2 produces False, not a WHERE clause. So lets see exactly what that expression is doing:

>>> users.c.id == addresses.c.user_id 
<sqlalchemy.sql.expression.BinaryExpression object at 0x...>

Wow, surprise ! This is neither a True nor a False. Well what is it ?

>>> str(users.c.id == addresses.c.user_id)
'users.id = addresses.user_id'

As you can see, the == operator is producing an object that is very much like the Insert and select() objects we’ve made so far, thanks to Python’s __eq__() builtin; you call str() on it and it produces SQL. By now, one can see that everything we are working with is ultimately the same type of object. SQLAlchemy terms the base class of all of these expressions as ColumnElement.

Operators

Since we’ve stumbled upon SQLAlchemy’s operator paradigm, let’s go through some of its capabilities. We’ve seen how to equate two columns to each other:

>>> print users.c.id == addresses.c.user_id
users.id = addresses.user_id

If we use a literal value (a literal meaning, not a SQLAlchemy clause object), we get a bind parameter:

>>> print users.c.id == 7
users.id = :id_1

The 7 literal is embedded the resulting ColumnElement; we can use the same trick we did with the Insert object to see it:

>>> (users.c.id == 7).compile().params
{u'id_1': 7}

Most Python operators, as it turns out, produce a SQL expression here, like equals, not equals, etc.:

>>> print users.c.id != 7
users.id != :id_1

>>> # None converts to IS NULL
>>> print users.c.name == None
users.name IS NULL

>>> # reverse works too
>>> print 'fred' > users.c.name
users.name < :name_1

If we add two integer columns together, we get an addition expression:

>>> print users.c.id + addresses.c.id
users.id + addresses.id

Interestingly, the type of the Column is important! If we use + with two string based columns (recall we put types like Integer and String on our Column objects at the beginning), we get something different:

>>> print users.c.name + users.c.fullname
users.name || users.fullname

Where || is the string concatenation operator used on most databases. But not all of them. MySQL users, fear not:

>>> print (users.c.name + users.c.fullname).\
...      compile(bind=create_engine('mysql://'))
concat(users.name, users.fullname)

The above illustrates the SQL that’s generated for an Engine that’s connected to a MySQL database; the || operator now compiles as MySQL’s concat() function.

If you have come across an operator which really isn’t available, you can always use the ColumnOperators.op() method; this generates whatever operator you need:

>>> print users.c.name.op('tiddlywinks')('foo')
users.name tiddlywinks :name_1

This function can also be used to make bitwise operators explicit. For example:

somecolumn.op('&')(0xff)

is a bitwise AND of the value in somecolumn.

Operator Customization

While ColumnOperators.op() is handy to get at a custom operator in a hurry, the Core supports fundamental customization and extension of the operator system at the type level. The behavior of existing operators can be modified on a per-type basis, and new operations can be defined which become available for all column expressions that are part of that particular type. See the section Redefining and Creating New Operators for a description.

Conjunctions

We’d like to show off some of our operators inside of select() constructs. But we need to lump them together a little more, so let’s first introduce some conjunctions. Conjunctions are those little words like AND and OR that put things together. We’ll also hit upon NOT. and_(), or_(), and not_() can work from the corresponding functions SQLAlchemy provides (notice we also throw in a like()):

>>> from sqlalchemy.sql import and_, or_, not_
>>> print and_(
...         users.c.name.like('j%'),
...         users.c.id == addresses.c.user_id, 
...         or_(
...              addresses.c.email_address == 'wendy@aol.com',
...              addresses.c.email_address == 'jack@yahoo.com'
...         ),
...         not_(users.c.id > 5)
...       )
users.name LIKE :name_1 AND users.id = addresses.user_id AND
(addresses.email_address = :email_address_1
   OR addresses.email_address = :email_address_2)
AND users.id <= :id_1

And you can also use the re-jiggered bitwise AND, OR and NOT operators, although because of Python operator precedence you have to watch your parenthesis:

>>> print users.c.name.like('j%') & (users.c.id == addresses.c.user_id) &  \
...     (
...       (addresses.c.email_address == 'wendy@aol.com') | \
...       (addresses.c.email_address == 'jack@yahoo.com')
...     ) \
...     & ~(users.c.id>5) 
users.name LIKE :name_1 AND users.id = addresses.user_id AND
(addresses.email_address = :email_address_1
    OR addresses.email_address = :email_address_2)
AND users.id <= :id_1

So with all of this vocabulary, let’s select all users who have an email address at AOL or MSN, whose name starts with a letter between “m” and “z”, and we’ll also generate a column containing their full name combined with their email address. We will add two new constructs to this statement, between() and label(). between() produces a BETWEEN clause, and label() is used in a column expression to produce labels using the AS keyword; it’s recommended when selecting from expressions that otherwise would not have a name:

>>> s = select([(users.c.fullname +
...               ", " + addresses.c.email_address).
...                label('title')]).\
...        where(
...           and_(
...               users.c.id == addresses.c.user_id,
...               users.c.name.between('m', 'z'),
...               or_(
...                  addresses.c.email_address.like('%@aol.com'),
...                  addresses.c.email_address.like('%@msn.com')
...               )
...           )
...        )
>>> conn.execute(s).fetchall() 
SELECT users.fullname || ? || addresses.email_address AS title
FROM users, addresses
WHERE users.id = addresses.user_id AND users.name BETWEEN ? AND ? AND
(addresses.email_address LIKE ? OR addresses.email_address LIKE ?)
(', ', 'm', 'z', '%@aol.com', '%@msn.com')
[(u'Wendy Williams, wendy@aol.com',)]

Once again, SQLAlchemy figured out the FROM clause for our statement. In fact it will determine the FROM clause based on all of its other bits; the columns clause, the where clause, and also some other elements which we haven’t covered yet, which include ORDER BY, GROUP BY, and HAVING.

A shortcut to using and_() is to chain together multiple where() clauses. The above can also be written as:

>>> s = select([(users.c.fullname +
...               ", " + addresses.c.email_address).
...                label('title')]).\
...        where(users.c.id == addresses.c.user_id).\
...        where(users.c.name.between('m', 'z')).\
...        where(
...               or_(
...                  addresses.c.email_address.like('%@aol.com'),
...                  addresses.c.email_address.like('%@msn.com')
...               )
...        )
>>> conn.execute(s).fetchall() 
SELECT users.fullname || ? || addresses.email_address AS title
FROM users, addresses
WHERE users.id = addresses.user_id AND users.name BETWEEN ? AND ? AND
(addresses.email_address LIKE ? OR addresses.email_address LIKE ?)
(', ', 'm', 'z', '%@aol.com', '%@msn.com')
[(u'Wendy Williams, wendy@aol.com',)]

The way that we can build up a select() construct through successive method calls is called method chaining.

Using Text

Our last example really became a handful to type. Going from what one understands to be a textual SQL expression into a Python construct which groups components together in a programmatic style can be hard. That’s why SQLAlchemy lets you just use strings too. The text() construct represents any textual statement, in a backend-agnostic way. To use bind parameters with text(), always use the named colon format. Such as below, we create a text() and execute it, feeding in the bind parameters to the execute() method:

>>> from sqlalchemy.sql import text
>>> s = text(
...     "SELECT users.fullname || ', ' || addresses.email_address AS title "
...         "FROM users, addresses "
...         "WHERE users.id = addresses.user_id "
...         "AND users.name BETWEEN :x AND :y "
...         "AND (addresses.email_address LIKE :e1 "
...             "OR addresses.email_address LIKE :e2)")
sql>>> conn.execute(s, x='m', y='z', e1='%@aol.com', e2='%@msn.com').fetchall() 
[(u'Wendy Williams, wendy@aol.com',)]

To gain a “hybrid” approach, the select() construct accepts strings for most of its arguments. Below we combine the usage of strings with our constructed select() object, by using the select() object to structure the statement, and strings to provide all the content within the structure. For this example, SQLAlchemy is not given any Column or Table objects in any of its expressions, so it cannot generate a FROM clause. So we also use the select_from() method, which accepts a FromClause or string expression to be placed within the FROM clause:

>>> s = select([
...            "users.fullname || ', ' || addresses.email_address AS title"
...          ]).\
...           where(
...              and_(
...                 "users.id = addresses.user_id",
...                 "users.name BETWEEN 'm' AND 'z'",
...                 "(addresses.email_address LIKE :x OR addresses.email_address LIKE :y)"
...             )
...           ).select_from('users, addresses')
sql>>> conn.execute(s, x='%@aol.com', y='%@msn.com').fetchall() 
[(u'Wendy Williams, wendy@aol.com',)]

Going from constructed SQL to text, we lose some capabilities. We lose the capability for SQLAlchemy to compile our expression to a specific target database; above, our expression won’t work with MySQL since it has no || construct. It also becomes more tedious for SQLAlchemy to be made aware of the datatypes in use; for example, if our bind parameters required UTF-8 encoding before going in, or conversion from a Python datetime into a string (as is required with SQLite), we would have to add extra information to our text() construct. Similar issues arise on the result set side, where SQLAlchemy also performs type-specific data conversion in some cases; still more information can be added to text() to work around this. But what we really lose from our statement is the ability to manipulate it, transform it, and analyze it. These features are critical when using the ORM, which makes heavy usage of relational transformations. To show off what we mean, we’ll first introduce the ALIAS construct and the JOIN construct, just so we have some juicier bits to play with.

Using Aliases

The alias in SQL corresponds to a “renamed” version of a table or SELECT statement, which occurs anytime you say “SELECT .. FROM sometable AS someothername”. The AS creates a new name for the table. Aliases are a key construct as they allow any table or subquery to be referenced by a unique name. In the case of a table, this allows the same table to be named in the FROM clause multiple times. In the case of a SELECT statement, it provides a parent name for the columns represented by the statement, allowing them to be referenced relative to this name.

In SQLAlchemy, any Table, select() construct, or other selectable can be turned into an alias using the FromClause.alias() method, which produces a Alias construct. As an example, suppose we know that our user jack has two particular email addresses. How can we locate jack based on the combination of those two addresses? To accomplish this, we’d use a join to the addresses table, once for each address. We create two Alias constructs against addresses, and then use them both within a select() construct:

>>> a1 = addresses.alias()
>>> a2 = addresses.alias()
>>> s = select([users]).\
...        where(and_(
...            users.c.id == a1.c.user_id,
...            users.c.id == a2.c.user_id,
...            a1.c.email_address == 'jack@msn.com',
...            a2.c.email_address == 'jack@yahoo.com'
...        ))
sql>>> conn.execute(s).fetchall()  
[(1, u'jack', u'Jack Jones')]

Note that the Alias construct generated the names addresses_1 and addresses_2 in the final SQL result. The generation of these names is determined by the position of the construct within the statement. If we created a query using only the second a2 alias, the name would come out as addresses_1. The generation of the names is also deterministic, meaning the same SQLAlchemy statement construct will produce the identical SQL string each time it is rendered for a particular dialect.

Since on the outside, we refer to the alias using the Alias construct itself, we don’t need to be concerned about the generated name. However, for the purposes of debugging, it can be specified by passing a string name to the FromClause.alias() method:

>>> a1 = addresses.alias('a1')

Aliases can of course be used for anything which you can SELECT from, including SELECT statements themselves. We can self-join the users table back to the select() we’ve created by making an alias of the entire statement. The correlate(None) directive is to avoid SQLAlchemy’s attempt to “correlate” the inner users table with the outer one:

>>> a1 = s.correlate(None).alias()
>>> s = select([users.c.name]).where(users.c.id == a1.c.id)
sql>>> conn.execute(s).fetchall()  
[(u'jack',)]

Using Joins

We’re halfway along to being able to construct any SELECT expression. The next cornerstone of the SELECT is the JOIN expression. We’ve already been doing joins in our examples, by just placing two tables in either the columns clause or the where clause of the select() construct. But if we want to make a real “JOIN” or “OUTERJOIN” construct, we use the join() and outerjoin() methods, most commonly accessed from the left table in the join:

>>> print users.join(addresses)
users JOIN addresses ON users.id = addresses.user_id

The alert reader will see more surprises; SQLAlchemy figured out how to JOIN the two tables ! The ON condition of the join, as it’s called, was automatically generated based on the ForeignKey object which we placed on the addresses table way at the beginning of this tutorial. Already the join() construct is looking like a much better way to join tables.

Of course you can join on whatever expression you want, such as if we want to join on all users who use the same name in their email address as their username:

>>> print users.join(addresses,
...                 addresses.c.email_address.like(users.c.name + '%')
...             )
users JOIN addresses ON addresses.email_address LIKE (users.name || :name_1)

When we create a select() construct, SQLAlchemy looks around at the tables we’ve mentioned and then places them in the FROM clause of the statement. When we use JOINs however, we know what FROM clause we want, so here we make use of the select_from() method:

>>> s = select([users.c.fullname]).select_from(
...    users.join(addresses,
...             addresses.c.email_address.like(users.c.name + '%'))
...    )
sql>>> conn.execute(s).fetchall()  
[(u'Jack Jones',), (u'Jack Jones',), (u'Wendy Williams',)]

The outerjoin() method creates LEFT OUTER JOIN constructs, and is used in the same way as join():

>>> s = select([users.c.fullname]).select_from(users.outerjoin(addresses))
>>> print s  
SELECT users.fullname
    FROM users
    LEFT OUTER JOIN addresses ON users.id = addresses.user_id

That’s the output outerjoin() produces, unless, of course, you’re stuck in a gig using Oracle prior to version 9, and you’ve set up your engine (which would be using OracleDialect) to use Oracle-specific SQL:

>>> from sqlalchemy.dialects.oracle import dialect as OracleDialect
>>> print s.compile(dialect=OracleDialect(use_ansi=False))  
SELECT users.fullname
FROM users, addresses
WHERE users.id = addresses.user_id(+)

If you don’t know what that SQL means, don’t worry ! The secret tribe of Oracle DBAs don’t want their black magic being found out ;).

Everything Else

The concepts of creating SQL expressions have been introduced. What’s left are more variants of the same themes. So now we’ll catalog the rest of the important things we’ll need to know.

Bind Parameter Objects

Throughout all these examples, SQLAlchemy is busy creating bind parameters wherever literal expressions occur. You can also specify your own bind parameters with your own names, and use the same statement repeatedly. The bindparam() construct is used to produce a bound parameter with a given name. While SQLAlchemy always refers to bound parameters by name on the API side, the database dialect converts to the appropriate named or positional style at execution time, as here where it converts to positional for SQLite:

>>> from sqlalchemy.sql import bindparam
>>> s = users.select(users.c.name == bindparam('username'))
sql>>> conn.execute(s, username='wendy').fetchall() 
[(2, u'wendy', u'Wendy Williams')]

Another important aspect of bindparam() is that it may be assigned a type. The type of the bind parameter will determine its behavior within expressions and also how the data bound to it is processed before being sent off to the database:

>>> s = users.select(users.c.name.like(bindparam('username', type_=String) + text("'%'")))
sql>>> conn.execute(s, username='wendy').fetchall() 
[(2, u'wendy', u'Wendy Williams')]

bindparam() constructs of the same name can also be used multiple times, where only a single named value is needed in the execute parameters:

>>> s = select([users, addresses]).\
...     where(
...        or_(
...          users.c.name.like(
...                 bindparam('name', type_=String) + text("'%'")),
...          addresses.c.email_address.like(
...                 bindparam('name', type_=String) + text("'@%'"))
...        )
...     ).\
...     select_from(users.outerjoin(addresses)).\
...     order_by(addresses.c.id)
sql>>> conn.execute(s, name='jack').fetchall() 
[(1, u'jack', u'Jack Jones', 1, 1, u'jack@yahoo.com'), (1, u'jack', u'Jack Jones', 2, 1, u'jack@msn.com')]

See also

bindparam()

Functions

SQL functions are created using the func keyword, which generates functions using attribute access:

>>> from sqlalchemy.sql import func
>>> print func.now()
now()

>>> print func.concat('x', 'y')
concat(:param_1, :param_2)

By “generates”, we mean that any SQL function is created based on the word you choose:

>>> print func.xyz_my_goofy_function() 
xyz_my_goofy_function()

Certain function names are known by SQLAlchemy, allowing special behavioral rules to be applied. Some for example are “ANSI” functions, which mean they don’t get the parenthesis added after them, such as CURRENT_TIMESTAMP:

>>> print func.current_timestamp()
CURRENT_TIMESTAMP

Functions are most typically used in the columns clause of a select statement, and can also be labeled as well as given a type. Labeling a function is recommended so that the result can be targeted in a result row based on a string name, and assigning it a type is required when you need result-set processing to occur, such as for Unicode conversion and date conversions. Below, we use the result function scalar() to just read the first column of the first row and then close the result; the label, even though present, is not important in this case:

>>> conn.execute(
...     select([
...            func.max(addresses.c.email_address, type_=String).
...                label('maxemail')
...           ])
...     ).scalar() 
SELECT max(addresses.email_address) AS maxemail FROM addresses ()
u'www@www.org'

Databases such as PostgreSQL and Oracle which support functions that return whole result sets can be assembled into selectable units, which can be used in statements. Such as, a database function calculate() which takes the parameters x and y, and returns three columns which we’d like to name q, z and r, we can construct using “lexical” column objects as well as bind parameters:

>>> from sqlalchemy.sql import column
>>> calculate = select([column('q'), column('z'), column('r')]).\
...        select_from(
...             func.calculate(
...                    bindparam('x'),
...                    bindparam('y')
...                )
...             )
>>> calc = calculate.alias()
>>> print select([users]).where(users.c.id > calc.c.z) 
SELECT users.id, users.name, users.fullname
FROM users, (SELECT q, z, r
FROM calculate(:x, :y)) AS anon_1
WHERE users.id > anon_1.z

If we wanted to use our calculate statement twice with different bind parameters, the unique_params() function will create copies for us, and mark the bind parameters as “unique” so that conflicting names are isolated. Note we also make two separate aliases of our selectable:

>>> calc1 = calculate.alias('c1').unique_params(x=17, y=45)
>>> calc2 = calculate.alias('c2').unique_params(x=5, y=12)
>>> s = select([users]).\
...         where(users.c.id.between(calc1.c.z, calc2.c.z))
>>> print s 
SELECT users.id, users.name, users.fullname
FROM users,
    (SELECT q, z, r FROM calculate(:x_1, :y_1)) AS c1,
    (SELECT q, z, r FROM calculate(:x_2, :y_2)) AS c2
WHERE users.id BETWEEN c1.z AND c2.z

>>> s.compile().params
{u'x_2': 5, u'y_2': 12, u'y_1': 45, u'x_1': 17}

See also

func

Window Functions

Any FunctionElement, including functions generated by func, can be turned into a “window function”, that is an OVER clause, using the FunctionElement.over() method:

>>> s = select([
...         users.c.id,
...         func.row_number().over(order_by=users.c.name)
...     ])
>>> print s 
SELECT users.id, row_number() OVER (ORDER BY users.name) AS anon_1
FROM users

Unions and Other Set Operations

Unions come in two flavors, UNION and UNION ALL, which are available via module level functions union() and union_all():

>>> from sqlalchemy.sql import union
>>> u = union(
...     addresses.select().
...             where(addresses.c.email_address == 'foo@bar.com'),
...    addresses.select().
...             where(addresses.c.email_address.like('%@yahoo.com')),
... ).order_by(addresses.c.email_address)

sql>>> conn.execute(u).fetchall() 
[(1, 1, u'jack@yahoo.com')]

Also available, though not supported on all databases, are intersect(), intersect_all(), except_(), and except_all():

>>> from sqlalchemy.sql import except_
>>> u = except_(
...    addresses.select().
...             where(addresses.c.email_address.like('%@%.com')),
...    addresses.select().
...             where(addresses.c.email_address.like('%@msn.com'))
... )

sql>>> conn.execute(u).fetchall() 
[(1, 1, u'jack@yahoo.com'), (4, 2, u'wendy@aol.com')]

A common issue with so-called “compound” selectables arises due to the fact that they nest with parenthesis. SQLite in particular doesn’t like a statement that starts with parenthesis. So when nesting a “compound” inside a “compound”, it’s often necessary to apply .alias().select() to the first element of the outermost compound, if that element is also a compound. For example, to nest a “union” and a “select” inside of “except_”, SQLite will want the “union” to be stated as a subquery:

>>> u = except_(
...    union(
...         addresses.select().
...             where(addresses.c.email_address.like('%@yahoo.com')),
...         addresses.select().
...             where(addresses.c.email_address.like('%@msn.com'))
...     ).alias().select(),   # apply subquery here
...    addresses.select(addresses.c.email_address.like('%@msn.com'))
... )
sql>>> conn.execute(u).fetchall()   
[(1, 1, u'jack@yahoo.com')]

Scalar Selects

A scalar select is a SELECT that returns exactly one row and one column. It can then be used as a column expression. A scalar select is often a correlated subquery, which relies upon the enclosing SELECT statement in order to acquire at least one of its FROM clauses.

The select() construct can be modified to act as a column expression by calling either the as_scalar() or label() method:

>>> stmt = select([func.count(addresses.c.id)]).\
...             where(users.c.id == addresses.c.user_id).\
...             as_scalar()

The above construct is now a ScalarSelect object, and is no longer part of the FromClause hierarchy; it instead is within the ColumnElement family of expression constructs. We can place this construct the same as any other column within another select():

>>> conn.execute(select([users.c.name, stmt])).fetchall()  
SELECT users.name, (SELECT count(addresses.id) AS count_1 FROM addresses WHERE users.id = addresses.user_id) AS anon_1 FROM users ()
[(u'jack', 2), (u'wendy', 2)]

To apply a non-anonymous column name to our scalar select, we create it using SelectBase.label() instead:

>>> stmt = select([func.count(addresses.c.id)]).\
...             where(users.c.id == addresses.c.user_id).\
...             label("address_count")
>>> conn.execute(select([users.c.name, stmt])).fetchall()  
SELECT users.name, (SELECT count(addresses.id) AS count_1 FROM addresses WHERE users.id = addresses.user_id) AS address_count FROM users ()
[(u'jack', 2), (u'wendy', 2)]

Correlated Subqueries

Notice in the examples on Scalar Selects, the FROM clause of each embedded select did not contain the users table in its FROM clause. This is because SQLAlchemy automatically correlates embedded FROM objects to that of an enclosing query, if present, and if the inner SELECT statement would still have at least one FROM clause of its own. For example:

>>> stmt = select([addresses.c.user_id]).\
...             where(addresses.c.user_id == users.c.id).\
...             where(addresses.c.email_address == 'jack@yahoo.com')
>>> enclosing_stmt = select([users.c.name]).where(users.c.id == stmt)
>>> conn.execute(enclosing_stmt).fetchall()  
SELECT users.name FROM users WHERE users.id = (SELECT addresses.user_id FROM addresses WHERE addresses.user_id = users.id AND addresses.email_address = ?) ('jack@yahoo.com',)
[(u'jack',)]

Auto-correlation will usually do what’s expected, however it can also be controlled. For example, if we wanted a statement to correlate only to the addresses table but not the users table, even if both were present in the enclosing SELECT, we use the correlate() method to specify those FROM clauses that may be correlated:

>>> stmt = select([users.c.id]).\
...             where(users.c.id == addresses.c.user_id).\
...             where(users.c.name == 'jack').\
...             correlate(addresses)
>>> enclosing_stmt = select(
...         [users.c.name, addresses.c.email_address]).\
...     select_from(users.join(addresses)).\
...     where(users.c.id == stmt)
>>> conn.execute(enclosing_stmt).fetchall()  
SELECT users.name, addresses.email_address FROM users JOIN addresses ON users.id = addresses.user_id WHERE users.id = (SELECT users.id FROM users WHERE users.id = addresses.user_id AND users.name = ?) ('jack',)
[(u'jack', u'jack@yahoo.com'), (u'jack', u'jack@msn.com')]

To entirely disable a statement from correlating, we can pass None as the argument:

>>> stmt = select([users.c.id]).\
...             where(users.c.name == 'wendy').\
...             correlate(None)
>>> enclosing_stmt = select([users.c.name]).\
...     where(users.c.id == stmt)
>>> conn.execute(enclosing_stmt).fetchall()  
SELECT users.name FROM users WHERE users.id = (SELECT users.id FROM users WHERE users.name = ?) ('wendy',)
[(u'wendy',)]

We can also control correlation via exclusion, using the Select.correlate_except() method. Such as, we can write our SELECT for the users table by telling it to correlate all FROM clauses except for users:

>>> stmt = select([users.c.id]).\
...             where(users.c.id == addresses.c.user_id).\
...             where(users.c.name == 'jack').\
...             correlate_except(users)
>>> enclosing_stmt = select(
...         [users.c.name, addresses.c.email_address]).\
...     select_from(users.join(addresses)).\
...     where(users.c.id == stmt)
>>> conn.execute(enclosing_stmt).fetchall()  
SELECT users.name, addresses.email_address FROM users JOIN addresses ON users.id = addresses.user_id WHERE users.id = (SELECT users.id FROM users WHERE users.id = addresses.user_id AND users.name = ?) ('jack',)
[(u'jack', u'jack@yahoo.com'), (u'jack', u'jack@msn.com')]

Ordering, Grouping, Limiting, Offset...ing...

Ordering is done by passing column expressions to the order_by() method:

>>> stmt = select([users.c.name]).order_by(users.c.name)
>>> conn.execute(stmt).fetchall()  
SELECT users.name FROM users ORDER BY users.name ()
[(u'jack',), (u'wendy',)]

Ascending or descending can be controlled using the asc() and desc() modifiers:

>>> stmt = select([users.c.name]).order_by(users.c.name.desc())
>>> conn.execute(stmt).fetchall()  
SELECT users.name FROM users ORDER BY users.name DESC ()
[(u'wendy',), (u'jack',)]

Grouping refers to the GROUP BY clause, and is usually used in conjunction with aggregate functions to establish groups of rows to be aggregated. This is provided via the group_by() method:

>>> stmt = select([users.c.name, func.count(addresses.c.id)]).\
...             select_from(users.join(addresses)).\
...             group_by(users.c.name)
>>> conn.execute(stmt).fetchall()  
SELECT users.name, count(addresses.id) AS count_1 FROM users JOIN addresses ON users.id = addresses.user_id GROUP BY users.name ()
[(u'jack', 2), (u'wendy', 2)]

HAVING can be used to filter results on an aggregate value, after GROUP BY has been applied. It’s available here via the having() method:

>>> stmt = select([users.c.name, func.count(addresses.c.id)]).\
...             select_from(users.join(addresses)).\
...             group_by(users.c.name).\
...             having(func.length(users.c.name) > 4)
>>> conn.execute(stmt).fetchall()  
SELECT users.name, count(addresses.id) AS count_1 FROM users JOIN addresses ON users.id = addresses.user_id GROUP BY users.name HAVING length(users.name) > ? (4,)
[(u'wendy', 2)]

A common system of dealing with duplicates in composed SELECT statments is the DISTINCT modifier. A simple DISTINCT clause can be added using the Select.distinct() method:

>>> stmt = select([users.c.name]).\
...             where(addresses.c.email_address.
...                    contains(users.c.name)).\
...             distinct()
>>> conn.execute(stmt).fetchall()  
SELECT DISTINCT users.name FROM users, addresses WHERE addresses.email_address LIKE '%%' || users.name || '%%' ()
[(u'jack',), (u'wendy',)]

Most database backends support a system of limiting how many rows are returned, and the majority also feature a means of starting to return rows after a given “offset”. While common backends like Postgresql, MySQL and SQLite support LIMIT and OFFSET keywords, other backends need to refer to more esoteric features such as “window functions” and row ids to achieve the same effect. The limit() and offset() methods provide an easy abstraction into the current backend’s methodology:

>>> stmt = select([users.c.name, addresses.c.email_address]).\
...             select_from(users.join(addresses)).\
...             limit(1).offset(1)
>>> conn.execute(stmt).fetchall()  
SELECT users.name, addresses.email_address FROM users JOIN addresses ON users.id = addresses.user_id LIMIT ? OFFSET ? (1, 1)
[(u'jack', u'jack@msn.com')]

Inserts, Updates and Deletes

We’ve seen insert() demonstrated earlier in this tutorial. Where insert() prodces INSERT, the update() method produces UPDATE. Both of these constructs feature a method called values() which specifies the VALUES or SET clause of the statement.

The values() method accommodates any column expression as a value:

>>> stmt = users.update().\
...             values(fullname="Fullname: " + users.c.name)
>>> conn.execute(stmt) 
UPDATE users SET fullname=(? || users.name) ('Fullname: ',) COMMIT
<sqlalchemy.engine.result.ResultProxy object at 0x...>

When using insert() or update() in an “execute many” context, we may also want to specify named bound parameters which we can refer to in the argument list. The two constructs will automatically generate bound placeholders for any column names passed in the dictionaries sent to execute() at execution time. However, if we wish to use explicitly targeted named parameters with composed expressions, we need to use the bindparam() construct. When using bindparam() with insert() or update(), the names of the table’s columns themselves are reserved for the “automatic” generation of bind names. We can combine the usage of implicitly available bind names and explicitly named parameters as in the example below:

>>> stmt = users.insert().\
...         values(name=bindparam('_name') + " .. name")
>>> conn.execute(stmt, [               
...        {'id':4, '_name':'name1'},
...        {'id':5, '_name':'name2'},
...        {'id':6, '_name':'name3'},
...     ])
INSERT INTO users (id, name) VALUES (?, (? || ?)) ((4, 'name1', ' .. name'), (5, 'name2', ' .. name'), (6, 'name3', ' .. name')) COMMIT

An UPDATE statement is emitted using the update() construct. This works much like an INSERT, except there is an additional WHERE clause that can be specified:

>>> stmt = users.update().\
...             where(users.c.name == 'jack').\
...             values(name='ed')

>>> conn.execute(stmt) 
UPDATE users SET name=? WHERE users.name = ? ('ed', 'jack') COMMIT
<sqlalchemy.engine.result.ResultProxy object at 0x...>

When using update() in an “execute many” context, we may wish to also use explicitly named bound parameters in the WHERE clause. Again, bindparam() is the construct used to achieve this:

>>> stmt = users.update().\
...             where(users.c.name == bindparam('oldname')).\
...             values(name=bindparam('newname'))
>>> conn.execute(stmt, [
...     {'oldname':'jack', 'newname':'ed'},
...     {'oldname':'wendy', 'newname':'mary'},
...     {'oldname':'jim', 'newname':'jake'},
...     ]) 
UPDATE users SET name=? WHERE users.name = ? (('ed', 'jack'), ('mary', 'wendy'), ('jake', 'jim')) COMMIT
<sqlalchemy.engine.result.ResultProxy object at 0x...>

Correlated Updates

A correlated update lets you update a table using selection from another table, or the same table:

>>> stmt = select([addresses.c.email_address]).\
...             where(addresses.c.user_id == users.c.id).\
...             limit(1)
>>> conn.execute(users.update().values(fullname=stmt)) 
UPDATE users SET fullname=(SELECT addresses.email_address FROM addresses WHERE addresses.user_id = users.id LIMIT ? OFFSET ?) (1, 0) COMMIT
<sqlalchemy.engine.result.ResultProxy object at 0x...>

Multiple Table Updates

New in version 0.7.4.

The Postgresql, Microsoft SQL Server, and MySQL backends all support UPDATE statements that refer to multiple tables. For PG and MSSQL, this is the “UPDATE FROM” syntax, which updates one table at a time, but can reference additional tables in an additional “FROM” clause that can then be referenced in the WHERE clause directly. On MySQL, multiple tables can be embedded into a single UPDATE statement separated by a comma. The SQLAlchemy update() construct supports both of these modes implicitly, by specifying multiple tables in the WHERE clause:

stmt = users.update().\
        values(name='ed wood').\
        where(users.c.id == addresses.c.id).\
        where(addresses.c.email_address.startswith('ed%'))
conn.execute(stmt)

The resulting SQL from the above statement would render as:

UPDATE users SET name=:name FROM addresses
WHERE users.id = addresses.id AND
addresses.email_address LIKE :email_address_1 || '%%'

When using MySQL, columns from each table can be assigned to in the SET clause directly, using the dictionary form passed to Update.values():

stmt = users.update().\
        values({
            users.c.name:'ed wood',
            addresses.c.email_address:'ed.wood@foo.com'
        }).\
        where(users.c.id == addresses.c.id).\
        where(addresses.c.email_address.startswith('ed%'))

The tables are referenced explicitly in the SET clause:

UPDATE users, addresses SET addresses.email_address=%s,
        users.name=%s WHERE users.id = addresses.id
        AND addresses.email_address LIKE concat(%s, '%%')

SQLAlchemy doesn’t do anything special when these constructs are used on a non-supporting database. The UPDATE FROM syntax generates by default when multiple tables are present, and the statement will be rejected by the database if this syntax is not supported.

Deletes

Finally, a delete. This is accomplished easily enough using the delete() construct:

>>> conn.execute(addresses.delete()) 
DELETE FROM addresses () COMMIT
<sqlalchemy.engine.result.ResultProxy object at 0x...> >>> conn.execute(users.delete().where(users.c.name > 'm'))
DELETE FROM users WHERE users.name > ? ('m',) COMMIT
<sqlalchemy.engine.result.ResultProxy object at 0x...>

Matched Row Counts

Both of update() and delete() are associated with matched row counts. This is a number indicating the number of rows that were matched by the WHERE clause. Note that by “matched”, this includes rows where no UPDATE actually took place. The value is available as rowcount:

>>> result = conn.execute(users.delete()) 
DELETE FROM users () COMMIT
>>> result.rowcount 1

Further Reference

Expression Language Reference: SQL Statements and Expressions API

Database Metadata Reference: Describing Databases with MetaData

Engine Reference: Engine Configuration

Connection Reference: Working with Engines and Connections

Types Reference: Column and Data Types