Blog

Postgres Columnar Storage: 4 Popular Extensions and a Quick Tutorial

July 2, 2022
Gilad Kleinman

What Is Columnar Storage in PostgreSQL? 

PostgreSQL is a row-oriented database by design, meaning it stores data tuple-by-tuple, with all column values of a row stored together. This structure is suitable for transactional workloads but not optimized for analytical queries that typically scan large volumes of data across a few columns.

Columnar storage stores data column-by-column. This layout allows systems to read only the relevant columns needed for a query, reducing I/O and improving performance for analytical workloads.

While PostgreSQL does not natively support columnar storage, several extensions and external tools introduce columnar capabilities. These solutions allow PostgreSQL to behave more like a columnar database, enabling it to handle analytical workloads more efficiently without departing from its core architecture.

Benefits of Columnar Storage 

Columnar databases have become popular in recent years. They offer several advantages, particularly for analytical and data-intensive workloads. While PostgreSQL is not natively a columnar database, there are several workarounds that can achieve some of these benefits:

  • Improved query performance: Since only the necessary columns are read from disk, queries run faster, reducing overall execution time.
  • Better compression: Columnar storage allows higher compression rates as similar data is stored together, reducing storage costs and improving I/O performance.
  • Efficient analytical workloads: Optimized for analytics, this storage format improves aggregation, filtering, and scanning operations.
  • Reduced I/O overhead: By reading only relevant columns, columnar storage significantly decreases disk I/O, leading to better resource utilization.
  • Optimized CPU usage: With reduced data processing needs, the CPU spends less time on irrelevant data, improving query efficiency and overall database performance.
  • Scalability for large datasets: Columnar storage enables PostgreSQL to handle massive datasets efficiently.

The Need for Columnar Storage in PostgreSQL: Common Use Cases 

Traditional PostgreSQL storage is optimized for transactional (OLTP) workloads, where operations typically access full rows. However, analytical (OLAP) workloads behave differently—they often scan large volumes of data across a small number of columns. This mismatch creates performance challenges for analytics on PostgreSQL without modifications.

Columnar storage extensions address this gap by reorganizing data for improved analytical efficiency. Performance gains stem from reduced I/O, vectorized execution, and more effective compression. This makes columnar storage useful for read-heavy analytical queries that process bulk data.

Common use cases include dashboards, data warehousing, and batch reporting. For example, querying metrics across time, filtering large event logs, or computing aggregates over large user datasets benefit significantly from the column-oriented layout. Columnar storage is also well-suited for bulk inserts and updates but is not ideal for frequent row-level changes or small transactions, which are better handled by PostgreSQL’s default row-based engine.

In the section below we cover several extensions that allow PostgreSQL users to adapt to analytical workloads without switching databases. By integrating columnar features within the PostgreSQL ecosystem, these tools enable more scalable analytics on large datasets.

Popular PostgreSQL Extensions Supporting Columnar Storage 

1. Citus Columnar

License: AGPL-3.0

Repository: https://github.com/citusdata/citus/tree/main/src/backend/columnar

GitHub stars: 11K+

Contributors: 100+

Citus Columnar is built on top of the Citus extension for PostgreSQL, enabling distributed data storage with a columnar format. It combines horizontal scale-out with efficient storage, making it suitable for large datasets. Citus Columnar segments data into smaller, manageable chunks enabling faster queries across distributed systems. 

Another feature of Citus Columnar is its integration with existing PostgreSQL tooling. This ensures users can transition to a columnar-based analytical framework with minimal disruptions. By utilizing PostgreSQL's ecosystem, Citus Columnar offers a flexible option for organizations looking to optimize their data infrastructure.

2. cstore_fdw

License: Apache-2.0

Repository: https://github.com/citusdata/cstore_fdw

GitHub stars: 1K+

Contributors: 10+

cstore_fdw is an extension for PostgreSQL that implements columnar storage by using foreign data wrappers. It handles large volume analytical workloads, improving performance by storing data in a highly compressed columnar format. The extension supports different compression algorithms, which help in reducing storage needs and improving query speed. 

This extension makes dimensional data analysis and big data operations faster and more efficient compared to traditional row-based storage. cstore_fdw integrates with the PostgreSQL ecosystem, allowing users to continue using familiar SQL queries while leveraging the benefits of columnar storage. With native support for various analytical functions, cstore_fdw mandates less data movement for analytical tasks.

3. pg_mooncake

License: MIT

Repository: https://github.com/Mooncake-Labs/pg_mooncake

GitHub stars: 1K+

Contributors: 10+

pg_mooncake is a PostgreSQL extension that focuses on improving read efficiency through advanced data structuring and compression mechanisms. By tailoring storage to analytical needs, pg_mooncake helps in reducing computational overhead and operational delays. As a result, it improves data retrieval speeds for complex queries that involve large datasets across multiple columns.

pg_mooncake extends PostgreSQL's capabilities while maintaining easy-to-use interfaces. This extension caters to organizations needing intensive analytical processing without drastically altering their existing database setups. With support for both real-time and batch processing, it accommodates growing data analytics demands.

4. Hydra Columnar

License: Apache-2.0

Repository: https://github.com/hydradatabase/columnar

GitHub stars: 2K+

Contributors: 10+

Hydra Columnar is a modern extension supporting columnar storage within PostgreSQL, focused on improving large-scale data analysis. It allows users to process vast amounts of data efficiently, reducing the time needed for heavy query loads. Hydra Columnar separates data into independent columns, allowing localized access, which optimizes the reading of frequently queried data columns and thus saves resources and time. 

This data layout boosts the performance of data analytics workflows. It also features sophisticated data compression techniques that further improve storage efficiency and query performance. The extension's compatibility with existing PostgreSQL infrastructure allows for smooth transitions.

Postgres Columnar Storage: Performance Considerations 

While columnar storage offers clear advantages for analytical workloads, it also introduces important trade-offs and performance nuances, particularly when integrated into a row-oriented system like PostgreSQL.

Write Performance Trade-Offs

Benchmark results show that insert performance using columnar storage is generally comparable to row-based storage (heap) at smaller scales. However, as data volume increases—approaching millions of rows—columnar storage begins to demonstrate improved insert efficiency.

Update operations, on the other hand, show only marginal improvements with columnar storage. Since columnar formats are not inherently optimized for frequent or fine-grained updates, this performance profile is expected. The overhead of modifying compressed, column-oriented data limits the benefits in update-heavy use cases.

For delete operations, columnar storage shows a consistent performance advantage. The gap between columnar and heap storage widens with scale, indicating that deletion is handled more efficiently in columnar formats, possibly due to reduced row overhead and more predictable data layout.

Compression Efficiency

When using PostgreSQL extensions that implement columnar storage, it's important to carefully evaluate compression settings. Columnar formats naturally group similar data together, which enables more effective compression compared to traditional row storage. However, the choice of compression algorithm, block size, and data type optimizations can significantly influence both query speed and storage footprint. Users should benchmark different settings against their specific workloads, balancing compression ratios with decompression overhead.

Scalability and Maintenance

Adding columnar storage extensions to PostgreSQL introduces additional complexity in database management. Maintenance tasks such as vacuuming, backups, and replication may require new procedures or adjustments to existing workflows. It's also essential to monitor performance as data volume grows, since columnar tables often benefit from periodic reorganization or compaction. Proper planning for extension updates, compatibility with PostgreSQL core upgrades, is crucial to ensure scalable, reliable operation over time.

Quick Tutorial: Implementing Columnar Storage in PostgreSQL with pg_mooncake [QG3]

pg_mooncake makes it simple to add columnar storage to PostgreSQL using a familiar SQL interface. It supports local file systems or cloud-backed storage formats like Iceberg and Delta Lake. You can run pg_mooncake locally using Docker, compile it from source, or use it in managed environments like Neon Postgres.

The instructions in this tutorial were adapted from the pg_mooncake documentation.

Step 1: Install pg_mooncake

Option A: Docker (Quick Start)
Download and run the container image:

docker pull mooncakelabs/pg_mooncake

# Start a Postgres server
docker run --name mooncake-demo -e POSTGRES_HOST_AUTH_METHOD=trust -d mooncakelabs/pg_mooncake

# Connect with psql client
docker run -it --rm --link mooncake-demo:postgres mooncakelabs/pg_mooncake psql -h postgres -U postgres

Option B: Build From Source
Clone the repository and build the extension:

git clone --recurse-submodules https://github.com/Mooncake-Labs/pg_mooncake.git
cd pg_mooncake
make release -j$(nproc)
make install

Supports PostgreSQL versions 14 through 17.

Option C: Use on Neon Postgres
Create a project on Neon, then enable beta extensions:

SET neon.allow_unstable_extensions = 'true';

Step 2: Enable the Extension

After connecting to your PostgreSQL instance, enable pg_mooncake:

CREATE EXTENSION pg_mooncake;

Step 3: Create a Columnstore Table

Columnstore tables are defined using the USING columnstore clause:

CREATE TABLE user_activity (
  user_id BIGINT,
  activity_type TEXT,
  activity_timestamp TIMESTAMP,
  duration INT
) USING columnstore;

This creates a table that stores each column separately, improving performance for analytical queries.

Step 4: Insert and Query Data

You can insert data using standard SQL syntax:

INSERT INTO user_activity VALUES
  (1, 'login', '2025-02-01 09:00:00', 120),
  (2, 'page_view', '2025-02-01 09:05:00', 30),
  (3, 'logout', '2025-02-01 09:30:00', 60),
  (4, 'error', '2025-02-01 09:24:00', 60);

Querying works the same way as with regular PostgreSQL tables:

SELECT * FROM user_activity;

Improving Complex Query Performance on PostgreSQL with Epsio

While columnar storage extensions can significantly improve analytical query performance in PostgreSQL, they represent just one approach to optimizing database workloads. For organizations running complex analytical queries that require real-time insights, maintaining fast query performance as data volumes grow remains a persistent challenge.

This is where Epsio's streaming SQL engine provides a complementary solution. Rather than restructuring how data is stored, Epsio focuses on how analytical queries are executed by building incremental materialized views that update automatically as underlying data changes.

When combined with columnar storage, Epsio can provide even greater performance benefits. The columnar extensions handle storage optimization for analytical workloads, while Epsio ensures that complex aggregations, joins, and analytical queries return results in milliseconds rather than minutes or hours. This combination allows PostgreSQL to serve both as a transactional database and a high-performance analytical engine.

For PostgreSQL users implementing columnar storage who still experience slow performance on complex analytical queries, Epsio offers a powerful way to accelerate those workloads without requiring additional infrastructure changes or data movement. The streaming SQL engine works seamlessly with existing PostgreSQL deployments, including those using columnar extensions, to deliver the real-time analytical performance that modern applications demand.

Deliver instant & up-to-date results for complex queries