Postgres Columnar Storage: 4 Popular Extensions and a Quick Tutorial

July 2, 2022

Gilad Kleinman

What Is Columnar Storage in PostgreSQL?

PostgreSQL is a row-oriented database by design, meaning it stores data tuple-by-tuple, with all column values of a row stored together. This structure is suitable for transactional workloads but not optimized for analytical queries that typically scan large volumes of data across a few columns.

Columnar storage stores data column-by-column. This layout allows systems to read only the relevant columns needed for a query, reducing I/O and improving performance for analytical workloads.

While PostgreSQL does not natively support columnar storage, several extensions and external tools introduce columnar capabilities. These solutions allow PostgreSQL to behave more like a columnar database, enabling it to handle analytical workloads more efficiently without departing from its core architecture.

Benefits of Columnar Storage

Columnar databases have become popular in recent years. They offer several advantages, particularly for analytical and data-intensive workloads. While PostgreSQL is not natively a columnar database, there are several workarounds that can achieve some of these benefits:

Improved query performance: Since only the necessary columns are read from disk, queries run faster, reducing overall execution time.
Better compression: Columnar storage allows higher compression rates as similar data is stored together, reducing storage costs and improving I/O performance.
Efficient analytical workloads: Optimized for analytics, this storage format improves aggregation, filtering, and scanning operations.
Reduced I/O overhead: By reading only relevant columns, columnar storage significantly decreases disk I/O, leading to better resource utilization.
Optimized CPU usage: With reduced data processing needs, the CPU spends less time on irrelevant data, improving query efficiency and overall database performance.
Scalability for large datasets: Columnar storage enables PostgreSQL to handle massive datasets efficiently.

The Need for Columnar Storage in PostgreSQL: Common Use Cases

Traditional PostgreSQL storage is optimized for transactional (OLTP) workloads, where operations typically access full rows. However, analytical (OLAP) workloads behave differently—they often scan large volumes of data across a small number of columns. This mismatch creates performance challenges for analytics on PostgreSQL without modifications.

Columnar storage extensions address this gap by reorganizing data for improved analytical efficiency. Performance gains stem from reduced I/O, vectorized execution, and more effective compression. This makes columnar storage useful for read-heavy analytical queries that process bulk data.

Common use cases include dashboards, data warehousing, and batch reporting. For example, querying metrics across time, filtering large event logs, or computing aggregates over large user datasets benefit significantly from the column-oriented layout. Columnar storage is also well-suited for bulk inserts and updates but is not ideal for frequent row-level changes or small transactions, which are better handled by PostgreSQL’s default row-based engine.

In the section below we cover several extensions that allow PostgreSQL users to adapt to analytical workloads without switching databases. By integrating columnar features within the PostgreSQL ecosystem, these tools enable more scalable analytics on large datasets.

Postgres Columnar Storage: Performance Considerations

While columnar storage offers clear advantages for analytical workloads, it also introduces important trade-offs and performance nuances, particularly when integrated into a row-oriented system like PostgreSQL.

Write Performance Trade-Offs

Benchmark results show that insert performance using columnar storage is generally comparable to row-based storage (heap) at smaller scales. However, as data volume increases—approaching millions of rows—columnar storage begins to demonstrate improved insert efficiency.

Update operations, on the other hand, show only marginal improvements with columnar storage. Since columnar formats are not inherently optimized for frequent or fine-grained updates, this performance profile is expected. The overhead of modifying compressed, column-oriented data limits the benefits in update-heavy use cases.

For delete operations, columnar storage shows a consistent performance advantage. The gap between columnar and heap storage widens with scale, indicating that deletion is handled more efficiently in columnar formats, possibly due to reduced row overhead and more predictable data layout.

Compression Efficiency

When using PostgreSQL extensions that implement columnar storage, it's important to carefully evaluate compression settings. Columnar formats naturally group similar data together, which enables more effective compression compared to traditional row storage. However, the choice of compression algorithm, block size, and data type optimizations can significantly influence both query speed and storage footprint. Users should benchmark different settings against their specific workloads, balancing compression ratios with decompression overhead.

Scalability and Maintenance

Adding columnar storage extensions to PostgreSQL introduces additional complexity in database management. Maintenance tasks such as vacuuming, backups, and replication may require new procedures or adjustments to existing workflows. It's also essential to monitor performance as data volume grows, since columnar tables often benefit from periodic reorganization or compaction. Proper planning for extension updates, compatibility with PostgreSQL core upgrades, is crucial to ensure scalable, reliable operation over time.

Quick Tutorial: Implementing Columnar Storage in PostgreSQL with pg_mooncake [QG3]

pg_mooncake makes it simple to add columnar storage to PostgreSQL using a familiar SQL interface. It supports local file systems or cloud-backed storage formats like Iceberg and Delta Lake. You can run pg_mooncake locally using Docker, compile it from source, or use it in managed environments like Neon Postgres.

The instructions in this tutorial were adapted from the pg_mooncake documentation.

Step 1: Install pg_mooncake

Option A: Docker (Quick Start)
Download and run the container image:

docker pull mooncakelabs/pg_mooncake

# Start a Postgres server
docker run --name mooncake-demo -e POSTGRES_HOST_AUTH_METHOD=trust -d mooncakelabs/pg_mooncake

# Connect with psql client
docker run -it --rm --link mooncake-demo:postgres mooncakelabs/pg_mooncake psql -h postgres -U postgres

Option B: Build From Source
Clone the repository and build the extension:

git clone --recurse-submodules https://github.com/Mooncake-Labs/pg_mooncake.git
cd pg_mooncake
make release -j$(nproc)
make install

Supports PostgreSQL versions 14 through 17.

Option C: Use on Neon Postgres
Create a project on Neon, then enable beta extensions:

SET neon.allow_unstable_extensions = 'true';

Step 2: Enable the Extension

After connecting to your PostgreSQL instance, enable pg_mooncake:

CREATE EXTENSION pg_mooncake;

Step 3: Create a Columnstore Table

Columnstore tables are defined using the USING columnstore clause:

CREATE TABLE user_activity (
  user_id BIGINT,
  activity_type TEXT,
  activity_timestamp TIMESTAMP,
  duration INT
) USING columnstore;

This creates a table that stores each column separately, improving performance for analytical queries.

Step 4: Insert and Query Data

You can insert data using standard SQL syntax:

INSERT INTO user_activity VALUES
  (1, 'login', '2025-02-01 09:00:00', 120),
  (2, 'page_view', '2025-02-01 09:05:00', 30),
  (3, 'logout', '2025-02-01 09:30:00', 60),
  (4, 'error', '2025-02-01 09:24:00', 60);

Querying works the same way as with regular PostgreSQL tables:

SELECT * FROM user_activity;

Improving Complex Query Performance on PostgreSQL with Epsio

While columnar storage extensions can significantly improve analytical query performance in PostgreSQL, they represent just one approach to optimizing database workloads. For organizations running complex analytical queries that require real-time insights, maintaining fast query performance as data volumes grow remains a persistent challenge.

This is where Epsio's streaming SQL engine provides a complementary solution. Rather than restructuring how data is stored, Epsio focuses on how analytical queries are executed by building incremental materialized views that update automatically as underlying data changes.

When combined with columnar storage, Epsio can provide even greater performance benefits. The columnar extensions handle storage optimization for analytical workloads, while Epsio ensures that complex aggregations, joins, and analytical queries return results in milliseconds rather than minutes or hours. This combination allows PostgreSQL to serve both as a transactional database and a high-performance analytical engine.

For PostgreSQL users implementing columnar storage who still experience slow performance on complex analytical queries, Epsio offers a powerful way to accelerate those workloads without requiring additional infrastructure changes or data movement. The streaming SQL engine works seamlessly with existing PostgreSQL deployments, including those using columnar extensions, to deliver the real-time analytical performance that modern applications demand.

‍

Deliver instant & up-to-date results for complex queries

Start free Read docs

Postgres Columnar Storage: 4 Popular Extensions and a Quick Tutorial

What Is Columnar Storage in PostgreSQL?

Benefits of Columnar Storage

The Need for Columnar Storage in PostgreSQL: Common Use Cases

Popular PostgreSQL Extensions Supporting Columnar Storage

1. Citus Columnar

2. cstore_fdw

3. pg_mooncake

4. Hydra Columnar

Postgres Columnar Storage: Performance Considerations

Write Performance Trade-Offs

Compression Efficiency

Scalability and Maintenance

Quick Tutorial: Implementing Columnar Storage in PostgreSQL with pg_mooncake [QG3]

Step 1: Install pg_mooncake

Step 2: Enable the Extension

Step 3: Create a Columnstore Table

Step 4: Insert and Query Data

Improving Complex Query Performance on PostgreSQL with Epsio

Database Materialized View vs. Regular View

Streaming Databases Explained: Real-Time Data Processing for Modern Systems

Why is stream processing hard?

Deliver instant & up-to-date results for complex queries

Postgres Columnar Storage: 4 Popular Extensions and a Quick Tutorial

What Is Columnar Storage in PostgreSQL?

Benefits of Columnar Storage

The Need for Columnar Storage in PostgreSQL: Common Use Cases

Popular PostgreSQL Extensions Supporting Columnar Storage

1. Citus Columnar

2. cstore_fdw

3. pg_mooncake

4. Hydra Columnar

Postgres Columnar Storage: Performance Considerations

Write Performance Trade-Offs

Compression Efficiency

Scalability and Maintenance

Quick Tutorial: Implementing Columnar Storage in PostgreSQL with pg_mooncake [QG3]

Step 1: Install pg_mooncake

Step 2: Enable the Extension

Step 3: Create a Columnstore Table

Step 4: Insert and Query Data

Improving Complex Query Performance on PostgreSQL with Epsio

Related Articles

Database Materialized View vs. Regular View

Streaming Databases Explained: Real-Time Data Processing for Modern Systems

Why is stream processing hard?

Related Articles

Postgres Materialized View Auto Refresh

Postgres REFRESH MATERIALIZED VIEW: A Comprehensive Guide

OLAP in Postgres: Features, Challenges, and Optimization Strategies

Deliver instant & up-to-date results for complex queries