Hashed indexes
A hashed index in MongoDB is a type of index that stores the hash of the field value instead of the value itself. Hashed indexes are particularly useful for sharding collections, as they provide a more even distribution of data across shards. They are also useful for equality-based queries but are not suitable for range queries or sorting.
How to Create a Hashed Index
You can create a hashed index using the createIndex
method and specifying the index type as "hashed"
.
// Create a hashed index on the "username" field
db.users.createIndex({ "username": "hashed" })
Features of Hashed Indexes
Hashed indexes serve specific use-cases in MongoDB that can't be efficiently addressed by other types of indexes. Here's a detailed look at why hashed indexes are essential:
1. Sharding Support
Uniform Distribution: Hashed indexes provide a more uniform distribution of data across shards, reducing the risk of hotspots.
Shard Key: They are often used as the shard key in a sharded cluster to ensure that data is evenly distributed across multiple servers.
// Sharding a collection using a hashed index
sh.shardCollection("mydb.users", { "username": "hashed" })
2. Fast Equality Queries
Optimized for Equality: Hashed indexes are optimized for equality-based queries and can quickly locate documents based on the hashed field.
// Find a user with a specific username
db.users.find({ "username": "john_doe" })
3. Storage Efficiency
- Compact Size: Hashed indexes can be more storage-efficient compared to other index types because they store only the hash of the field value, not the value itself.
4. Cache Efficiency
- Better Cache Utilization: The hash values are generally smaller and can be cached more efficiently, which can be beneficial for read-heavy workloads.
5. Predictable Query Performance
- Consistent Lookups: The performance of lookups remains consistent regardless of the distribution of values in the indexed field.
6. Simplified Index Management
- Single Field: Hashed indexes are single-field indexes, making them easier to manage compared to compound indexes.
7. Handling High-Cardinality Fields
- Efficiency: For fields with very high cardinality (i.e., a large number of unique values), hashed indexes can be more efficient than B-tree-based indexes.
Considerations and Limitations
No Range Queries: Hashed indexes are not suitable for range-based queries (
$gt
,$lt
, etc.) or sorting operations.Hash Collisions: While rare, hash collisions can occur where two different values produce the same hash. MongoDB uses a strong hash function to minimize this risk.
Limited Query Support: Hashed indexes only support equality queries and cannot be used for queries that involve sorting or filtering based on the indexed field.
No Text or Geospatial Support: Hashed indexes cannot be used for text search or geospatial queries.
CPU Overhead: Hashing operations can add some CPU overhead, although this is generally minimal.