Elisa  Marks

Elisa Marks

1622615497

Hypothetical Indexes in PostgreSQL

Explore the need and usage of hypothetical indexes in PostgreSQL.

In this blog, we will first cover what are hypothetical indexes and what is needed for these types of indexes. Secondly, we shall see the practical usage of these indexes.

Need of Hypothetical Indexes:

As the name suggests, these are not real indexes, these are hypothetical indexes i.e. they are virtual indexes that PostgreSQL query planner does not consider when running queries.

Now the question arises where these **Hypothetical Indexes **are useful?

First, let’s discuss one scenario. We have a large table that is currently in the production environment and we need to make some indexes on live DB, and we are not sure whether that index will be useful or not. We don’t even know if by making that index our production environment may be down!

So, the solution to the above problem will be following:

  • Let’s ignore the risk and make the index on the live table, which can result in the following scenario
  • First of all, it will take lots of time depending on the data present in the table.
  • Live queries may get affected badly if we are not sure if the index we are making will increase or decrease the cost.
  • We also do not know the size of the index it may be too large, which can again impact the production database server.
  • Another solution is to replicate the production database to the local dev environment and then apply all the hits and try there and then apply at the production environment. It seems a very safe and effective approach in almost all cases, but this will take too much time in setting up the things and testing with actual scenarios.
  • The third solution is Hypothetical Indexes, as this functionality will create imaginary indexes, not real indexes. But there are some things to note about these indexes:
  • It creates an index in our connection’s private memory. None of the catalog tables nor the original tables are actually touched.
  • The only way to see if we can benefit from that index is by running an EXPLAIN QUERY.
  • If you wish to run an EXPLAIN ANALYZE that runs that SQL and provides the run time stats, it would not be able to use that hypothetical index. This is because a hypothetical index does not exist in reality.
  • Currently, these indexes will work on BTREE ONLY. However, you can try if it works on other types of indexes.

#database #postgresql #postgres

What is GEEK

Buddha Community

Hypothetical Indexes in PostgreSQL
Charity  Ferry

Charity Ferry

1626142933

How to Use PostgreSQL's Automatic Index Recommendations

In our last blog, we learned about the Need and Usage of Hypothetical Indexes in PostgreSQL. We can now “check” easily in a live environment, determine if some particular index will be helpful or not, and figure out how we get to know which index to test. To do this, you’ll also need in-depth knowledge of indexing and experience in Postgresql. However, in PostgreSQL, we can get an automatic recommendation of indexes for specific queries by using three extensions hypog, pg_stat_statements, and pg_qualstats.

#postgresql #automatic index recommendations #automatic #index

Shawn  Durgan

Shawn Durgan

1598750220

MongoDB Indexes: Deep Dive, Understanding Indexes.

Getting a performance boost with the best usage of indexes, by understanding what’s the data structure, how it works’s/stored, how is it loaded into memory. How Query optimization make’s decision to select indexes.

_Basic understanding of indexes is required i.e what are indexes, index types, creating them. _https://docs.mongodb.com/manual/indexes/

  • Data Structure
  • Storage on disk
  • Memory Allocation.

Data Structure

Index on a filed/fields is stored in order that we specify using B-Tree data structure. Stored in ordered Let see what does it mean’s and how it help’s.

  • Index is created on the value of the filed referencing to the actual document stored.

Image for post

snipped from: MognoDB university

  • Using B-Tree indexes significantly reduces the number of comparison to find the document.

Image for post

snipped from: MognoDB university

  • Likewise in below picture we can see with index(sky blue line) even adding document still limit the number of document examined in comparison to without index/collscan.

Image for post

snipped from: MognoDB university

Storage On Disk

Let’s see/visualize how the index are stored on disk. Index stored on disk is managed by the database storage engine itself.

  • Uses prefix Index compression- Repeated prefix value is not written, let us see example to understand what it mean’s.
db.getCollection("movieTicket")
.ensureIndex({"showDate":1, "seatNo":1, "status":1});

How the index ({“showDate”:1, “seatNo”:1, “status”:1}) is stored on disk.

Image for post

showDate_1_seatNo_1_status_1

#index #mongodb #indexing #mongo

Elisa  Marks

Elisa Marks

1622615497

Hypothetical Indexes in PostgreSQL

Explore the need and usage of hypothetical indexes in PostgreSQL.

In this blog, we will first cover what are hypothetical indexes and what is needed for these types of indexes. Secondly, we shall see the practical usage of these indexes.

Need of Hypothetical Indexes:

As the name suggests, these are not real indexes, these are hypothetical indexes i.e. they are virtual indexes that PostgreSQL query planner does not consider when running queries.

Now the question arises where these **Hypothetical Indexes **are useful?

First, let’s discuss one scenario. We have a large table that is currently in the production environment and we need to make some indexes on live DB, and we are not sure whether that index will be useful or not. We don’t even know if by making that index our production environment may be down!

So, the solution to the above problem will be following:

  • Let’s ignore the risk and make the index on the live table, which can result in the following scenario
  • First of all, it will take lots of time depending on the data present in the table.
  • Live queries may get affected badly if we are not sure if the index we are making will increase or decrease the cost.
  • We also do not know the size of the index it may be too large, which can again impact the production database server.
  • Another solution is to replicate the production database to the local dev environment and then apply all the hits and try there and then apply at the production environment. It seems a very safe and effective approach in almost all cases, but this will take too much time in setting up the things and testing with actual scenarios.
  • The third solution is Hypothetical Indexes, as this functionality will create imaginary indexes, not real indexes. But there are some things to note about these indexes:
  • It creates an index in our connection’s private memory. None of the catalog tables nor the original tables are actually touched.
  • The only way to see if we can benefit from that index is by running an EXPLAIN QUERY.
  • If you wish to run an EXPLAIN ANALYZE that runs that SQL and provides the run time stats, it would not be able to use that hypothetical index. This is because a hypothetical index does not exist in reality.
  • Currently, these indexes will work on BTREE ONLY. However, you can try if it works on other types of indexes.

#database #postgresql #postgres

Luz  Runolfsson

Luz Runolfsson

1622615850

Hypothetical Indexes for PostgreSQL

HypoPG

HypoPG is a PostgreSQL extension adding support for hypothetical indexes.

An hypothetical – or virtual – index is an index that doesn’t really exists, and thus doesn’t cost CPU, disk or any resource to create. They’re useful to know if specific indexes can increase performance for problematic queries, since you can know if PostgreSQL will use these indexes or not without having to spend resources to create them.

For more thorough informations, please consult the official documentation.

For other general information, you can also consult this blog post.

Installation

  • Compatible with PostgreSQL 9.2 and above
  • Needs PostgreSQL header files
  • Decompress the tarball
  • sudo make install
  • In every needed database: CREATE EXTENSION hypopg;

Updating the extension

Note that hypopg doesn’t provide extension upgrade scripts, as there’s no data saved in any of the objects created. Therefore, you need to first drop the extension then create it again to get the new version.

Usage

NOTE: The hypothetical indexes are contained in a single backend. Therefore, if you add multiple hypothetical indexes, concurrent connections doing EXPLAIN won’t be bothered by your hypothetical indexes.

Assuming a simple test case:

rjuju=# CREATE TABLE hypo AS SELECT id, 'line ' || id AS val FROM generate_series(1,10000) id;
rjuju=# EXPLAIN SELECT * FROM hypo WHERE id = 1;
                      QUERY PLAN
-------------------------------------------------------
 Seq Scan on hypo  (cost=0.00..180.00 rows=1 width=13)
   Filter: (id = 1)
(2 rows)

The easiest way to create an hypothetical index is to use the hypopg_create_index functions with a regular CREATE INDEX statement as arg.

For instance:

rjuju=# SELECT * FROM hypopg_create_index('CREATE INDEX ON hypo (id)');

NOTE: Some information from the CREATE INDEX statement will be ignored, such as the index name if provided. Some of the ignored information will be handled in a future release.

You can check the available hypothetical indexes in your own backend:

rjuju=# SELECT * FROM hypopg_list_indexes();
 indexrelid |                 indexname                 | nspname | relname | amname
 -----------+-------------------------------------------+---------+---------+--------
     205101 | <41072>btree_hypo_id                      | public  | hypo    | btree

If you need more technical information on the hypothetical indexes, the hypopg() function will return the hypothetical indexes in a similar way as pg_index system catalog.

And now, let’s see if your previous EXPLAIN statement would use such an index:

rjuju=# EXPLAIN SELECT * FROM hypo WHERE id = 1;
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Index Scan using <41072>hypo_btree_hypo_id on hypo  (cost=0.29..8.30 rows=1 width=13)
   Index Cond: (id = 1)
(2 rows)

Of course, only EXPLAIN without ANALYZE will use hypothetical indexes:

rjuju=# EXPLAIN ANALYZE SELECT * FROM hypo WHERE id = 1;
                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Seq Scan on hypo  (cost=0.00..180.00 rows=1 width=13) (actual time=0.036..6.072 rows=1 loops=1)
   Filter: (id = 1)
   Rows Removed by Filter: 9999
 Planning time: 0.109 ms
 Execution time: 6.113 ms
(5 rows)

To remove your backend’s hypothetical indexes, you can use the function hypopg_drop_index(indexrelid) with the OID that the hypopg_list_indexes() function returns and call hypopg_reset() to remove all at once, or just close your current connection.

Download Details:

Author: HypoPG
The Demo/Documentation: View The Demo/Documentation
Download Link: Download The Source Code
Official Website: https://github.com/HypoPG/hypopg

#postgresql #database

PostgreSQL Connection Pooling: Part 4 – PgBouncer vs. Pgpool-II

In our previous posts in this series, we spoke at length about using PgBouncer  and Pgpool-II , the connection pool architecture and pros and cons of leveraging one for your PostgreSQL deployment. In our final post, we will put them head-to-head in a detailed feature comparison and compare the results of PgBouncer vs. Pgpool-II performance for your PostgreSQL hosting !

The bottom line – Pgpool-II is a great tool if you need load-balancing and high availability. Connection pooling is almost a bonus you get alongside. PgBouncer does only one thing, but does it really well. If the objective is to limit the number of connections and reduce resource consumption, PgBouncer wins hands down.

It is also perfectly fine to use both PgBouncer and Pgpool-II in a chain – you can have a PgBouncer to provide connection pooling, which talks to a Pgpool-II instance that provides high availability and load balancing. This gives you the best of both worlds!

Using PgBouncer with Pgpool-II - Connection Pooling Diagram

PostgreSQL Connection Pooling: Part 4 – PgBouncer vs. Pgpool-II

CLICK TO TWEET

Performance Testing

While PgBouncer may seem to be the better option in theory, theory can often be misleading. So, we pitted the two connection poolers head-to-head, using the standard pgbench tool, to see which one provides better transactions per second throughput through a benchmark test. For good measure, we ran the same tests without a connection pooler too.

Testing Conditions

All of the PostgreSQL benchmark tests were run under the following conditions:

  1. Initialized pgbench using a scale factor of 100.
  2. Disabled auto-vacuuming on the PostgreSQL instance to prevent interference.
  3. No other workload was working at the time.
  4. Used the default pgbench script to run the tests.
  5. Used default settings for both PgBouncer and Pgpool-II, except max_children*. All PostgreSQL limits were also set to their defaults.
  6. All tests ran as a single thread, on a single-CPU, 2-core machine, for a duration of 5 minutes.
  7. Forced pgbench to create a new connection for each transaction using the -C option. This emulates modern web application workloads and is the whole reason to use a pooler!

We ran each iteration for 5 minutes to ensure any noise averaged out. Here is how the middleware was installed:

  • For PgBouncer, we installed it on the same box as the PostgreSQL server(s). This is the configuration we use in our managed PostgreSQL clusters. Since PgBouncer is a very light-weight process, installing it on the box has no impact on overall performance.
  • For Pgpool-II, we tested both when the Pgpool-II instance was installed on the same machine as PostgreSQL (on box column), and when it was installed on a different machine (off box column). As expected, the performance is much better when Pgpool-II is off the box as it doesn’t have to compete with the PostgreSQL server for resources.

Throughput Benchmark

Here are the transactions per second (TPS) results for each scenario across a range of number of clients:

#database #developer #performance #postgresql #connection control #connection pooler #connection pooler performance #connection queue #high availability #load balancing #number of connections #performance testing #pgbench #pgbouncer #pgbouncer and pgpool-ii #pgbouncer vs pgpool #pgpool-ii #pooling modes #postgresql connection pooling #postgresql limits #resource consumption #throughput benchmark #transactions per second #without pooling