Skip to main content

Hashed indexes

A hashed index in MongoDB is a type of index that stores the hash of the field value instead of the value itself. Hashed indexes are particularly useful for sharding collections, as they provide a more even distribution of data across shards. They are also useful for equality-based queries but are not suitable for range queries or sorting.

How to Create a Hashed Index

You can create a hashed index using the createIndex method and specifying the index type as "hashed".

// Create a hashed index on the "username" field
db.users.createIndex({ "username": "hashed" })

Features of Hashed Indexes

Hashed indexes serve specific use-cases in MongoDB that can't be efficiently addressed by other types of indexes. Here's a detailed look at why hashed indexes are essential:

1. Sharding Support

  • Uniform Distribution: Hashed indexes provide a more uniform distribution of data across shards, reducing the risk of hotspots.

  • Shard Key: They are often used as the shard key in a sharded cluster to ensure that data is evenly distributed across multiple servers.

    // Sharding a collection using a hashed index
    sh.shardCollection("mydb.users", { "username": "hashed" })

2. Fast Equality Queries

  • Optimized for Equality: Hashed indexes are optimized for equality-based queries and can quickly locate documents based on the hashed field.

    // Find a user with a specific username
    db.users.find({ "username": "john_doe" })

3. Storage Efficiency

  • Compact Size: Hashed indexes can be more storage-efficient compared to other index types because they store only the hash of the field value, not the value itself.

4. Cache Efficiency

  • Better Cache Utilization: The hash values are generally smaller and can be cached more efficiently, which can be beneficial for read-heavy workloads.

5. Predictable Query Performance

  • Consistent Lookups: The performance of lookups remains consistent regardless of the distribution of values in the indexed field.

6. Simplified Index Management

  • Single Field: Hashed indexes are single-field indexes, making them easier to manage compared to compound indexes.

7. Handling High-Cardinality Fields

  • Efficiency: For fields with very high cardinality (i.e., a large number of unique values), hashed indexes can be more efficient than B-tree-based indexes.

Considerations and Limitations

  1. No Range Queries: Hashed indexes are not suitable for range-based queries ($gt, $lt, etc.) or sorting operations.

  2. Hash Collisions: While rare, hash collisions can occur where two different values produce the same hash. MongoDB uses a strong hash function to minimize this risk.

  3. Limited Query Support: Hashed indexes only support equality queries and cannot be used for queries that involve sorting or filtering based on the indexed field.

  4. No Text or Geospatial Support: Hashed indexes cannot be used for text search or geospatial queries.

  5. CPU Overhead: Hashing operations can add some CPU overhead, although this is generally minimal.