PostgreSQL: Data is Important

June 06, 2011

At our engineering all-hands meeting, I presented a 5-minute lightning talk on the merits of PostgreSQL. In the spirit of being able to search for data, I converted the deck into the following blog post. Enjoy!


PostgreSQL: Data is Important

But Why Postgres?

  • Solves some problems with migrations
  • Full-featured indexes
  • Robust constraints
  • Streaming replication
  • Sophisticated query planner
  • PostGIS
  • Full text search
  • Advanced SQL features

Migration Improvements

No downtime for most DDL migrations. Instantly performs:

Transactional DDL statements (not even Oracle has this)

  • wrap migrations in a SQL transaction
  • roll back everything in migration on a partial failure

Indexes


Constraints

  • CHECK constraints
  • foreign key constraints available for all tables
  • exclusion constraints (new in 9.0)

benefits of validation in the DB

  • reduces dependencies on monolithic app
  • improves consistency guarantees
  • deeper level of safeguards
  • don’t have to worry about prefetching associations
  • just as easy to test

Streaming Replication


Sophisticated Query Planner

  • cost-based (as opposed rule-based) query optimizer
  • heuristics gathered on tables to optimize better
  • costs of random seek, seq scan, etc. configurable
    • e.g making random seeks on an SSD not a bad thing to the optimizer
    • genetic algorithms to plan complex queries.

Better EXPLAIN

MySQL sample

+----+-------------+---------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+------------------------------+
| id | select_type | table               | type  | possible_keys                        | key                                  | key_len | ref  | rows | Extra                        |
+----+-------------+---------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+------------------------------+
|  1 | PRIMARY     | NULL                | NULL  | NULL                                 | NULL                                 | NULL    | NULL | NULL | Select tables optimized away |
|  2 | DERIVED     | bacon_cheeseburgers | range | index_bacon_cheeseburgers_on_user_id | index_bacon_cheeseburgers_on_user_id | 5       | NULL |    1 | Using where                  |
+----+-------------+---------------------+-------+--------------------------------------+--------------------------------------+---------+------+------+------------------------------+

Postgres Sample

                                                     QUERY PLAN                                                              
-----------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=8.31..8.32 rows=1 width=0)
   ->  GroupAggregate  (cost=0.00..8.30 rows=1 width=12)
         Filter: ((min(bacon_cheeseburgers.created_at) + '7 days'::interval) < max(bacon_cheeseburgers.created_at))
         ->  Index Scan using index_bacon_cheeseburgers_on_user_id on bacon_cheeseburgers  (cost=0.00..8.27 rows=1 width=12)
               Index Cond: ((user_id >= 69) AND (user_id <= 70))

Also... if you have a subquery, postgres won't execute it


PostGIS (for geo queries)

  • extensible GIS system
  • R tree indexes for built-in geometric data types
  • geometric data types
  • free census data (TIGER) for geocoding (addresses→coordinates)
  • Rails integration (example from railsonpostgresql.com)

    Restaurant.first(:conditions => ["the_geom && ?", Polygon.from_coordinates([[[x_min, y_min],
    [x_min, y_max], ...]]],
    4269)])


Full Text Search

  • full-featured text indexing
  • dictionaries (languages, stopwords, synonyms)
  • customizable search ranking algorithms
  • mature
  • lighter weight solution than solr/lucene

Advanced SQL Features

  • EXCEPT and INTERSECT statements in addition to UNION
  • pl/*, custom aggs, UDFs, etc.
  • Triggers
  • Windowing functions (example from postgresql.org)
    (for advanced analytics)

    SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary DESC) 
FROM empsalary;
    
      depname  | empno | salary | rank 
    -----------+-------+--------+------
     develop   |     8 |   6000 |    1
     develop   |    11 |   5200 |    2
     develop   |     9 |   4500 |    3
     personnel |     2 |   3900 |    1
     personnel |     5 |   3500 |    2
     sales     |     1 |   5000 |    1
     sales     |     4 |   4800 |    2
    
Mike Lewis
iOS Engineer @mikelewis

Comments

Get support help at squareup.com/support. We'll delete off-topic comments.