Optimising PostgreSQL database query using indexes

Database Indexes: Indexes are a special lookup table that the database search engine can use to speed up data retrieval. An Index is similar to a pointer to a particular row of a table. As a real world example, consider a Britannica Encyclopedia with 22 volumes of books, and an extra book listing  the index,with which you can find out the item you are searching for in those 22 books. PostgreSQL 9.5, provides several index algorithms like B-tree, Hash, GiST, SP-GiST and GIN. When you create an index using Ruby on Rails migration, by default it would be using the B-Tree migration. Whereas, as we move on to the indexing algorithms, we need to check into the general classification of an index and the data to be indexed. Primary Keys In rails, when we generate a model , by default an ID column would be added to the table. It would have integers values and they would be unique as well. By default, when you set a column as a primary key, the database would build a unique index on it. So we don’t need to add migration for it. Foreign Keys When you build a relationship between two tables you add a foreign_key in the child table to point to parent, eg: user_id, group_id. We need to query through this relationship a lot in rails, for example to load all the comments of post or all members of a group. So we need to index that for speed.. If you are using some non id value to reference a table, lets say in your application ,you give all your users a unique URL which has the username. (Eg: http://csnipp.com/coderhs), in that case we would be using the username to query the data, so we need to have it indexed. In fact you should index all the columns you might be using in your where queries. Like if you are searching for users of a particular age or income frequents in your reports, then you should create an index for them as well. Note:  What we explained above are single column indexes and multi column indexes. So if you are indexing just a single column in a table, its single column indexes.

CREATE INDEX index_name
ON prices (user_id);
Rails code:
add_index :prices, :user_id
When we index multiple columns, they are called multi column index.
CREATE INDEX index_name
ON user_views (user_id, article_id);
Rails code:
add_index :user_views, [:user_id, :article_id]
If you are joining two tables, using a column (which is not the already indexed foreign key) then you should index that as well. State column & Boolean column State and Boolean column are columns that should be indexed as there would be a lot of rows but only limited number of values in those columns. Boolean column would have only true or false (two values)and state columns will have more than two like eg: draft, published, pending. The speed of load would be faster for these columns as they are only limited keys that can be placed in the index, and on a single lookup we can load them. Partial indexes can be used in the above case, as the name suggests it’s an index over a subset of your table. The index would be building if it satisfies certain conditions. They can be most useful while writing scopes in rails. Lets say that you have a scope that picks up all the articles which are marked as SPAM. In your model you will write a scope like below
scope :articles, where(:spam => 'true')
So internally it’s a where query, one can add a partial index migration as follows:
CREATE INDEX index_name
on articles (spam is true);
add_index :articles, :spam, name: "index_articles_on_spam", where: "(spam IS true)"
When not to use indexes Using indexes speeds up the SELECT and WHERE command, but it does slows down the execution of INSERT. So avoid indexing when we have table that has a lot of insert and update So we should avoid using Indexes when we have a table that holds a huge number of raw data, where we do a lot of batch updates and insert. For example, in an IoT application we would pipe all the data from multiple devices to a single table , summarize  and insert it into its proper tables. And  by a lot of data, I am referring to at least 10+ MB of data per minute. In most cases, we would just truncate that table after processing, hence it would slow us down if we were to index it. If the table is too small and you know it will always be small If you have a setting table which just stores the application settings, that can be modified by an admin panel. Then it doesn’t seem to be worth having an index there. When you are manipulating the values of a column a lot Lets say the particular value of a column gets changes extremely a lot, like the website view count. Then indexing it is not highly recommended. Finally to complete this article. If you want to drop an index: SQL
DROP INDEX index_name;
remove_index :books, :created_at
  • Index Primary key
  • Index Foreign key
  • Index all columns you would be passing into where clause
  • Index the keys used to Join tables
  • Index the date column (if you are going to call it frequent, like rankings of a particular date)
  • Index the type column in an STI or polymorphism.
  • Add partial index to scopes
  • Do not index tables with a lot of read, write
  • Do not index tables you know that will remain small, all through out its life time
  • Do not index columns where you will be manipulating lot of its values.
Keep visiting here to know more about the PostgreSQL indexing algorithms in part 2 of this article. References: https://www.postgresql.org/docs/9.2/static/indexes-types.html http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html http://www.tutorialspoint.com/postgresql/postgresql_indexes.htm]]>