.. _dict_file: *********************** C++ Dictionary Overview *********************** Dictionary handling in C++ traditionally relies on ``std::unordered_map``, which provides convenient automatic memory management but comes with limitations in specialized environments. The standard library implementation uses dynamic allocation with opaque memory policies, making it unsuitable for embedded systems, real-time applications, or projects requiring deterministic memory behavior and MISRA C++ compliance. The *C++ Dictionary module* in this library provides a lightweight, allocator-aware generic hash dictionary implemented in C++ and declared in ``dict.hpp``. Unlike ``std::unordered_map``, this container: * Integrates directly with custom allocator hierarchies (heap, arena, buddy, slab) * Explicitly tracks entry count, bucket occupancy, and the owning allocator * Enforces initialization through a factory pattern preventing uninitialized dicts * Provides automatic cleanup via RAII with custom deleters * Returns explicit success/error state using the ``Expected`` pattern * Supports deep copy, merge, and templated iteration via ``foreach()`` The only dependencies this module has within the CSalt library are the ``allocator.hpp`` and ``error.hpp`` files, making it suitable for integration into larger systems while maintaining minimal coupling. **Allocator Integration** ========================= By integrating directly with the ``Allocator`` base class defined in ``allocator.hpp``, the dictionary container supports multiple allocation strategies without changing the public API: * **HeapAllocator** — Standard heap allocation via ``operator new`` * **ArenaAllocator** — Sequential bump allocation with bulk deallocation * **BuddyAllocator** — Power-of-two block allocation with coalescing * **SlabAllocator** — Fixed-size object pools for uniform allocations * **Custom allocators** — Any user-defined class inheriting from ``Allocator`` Both the ``Dict`` struct itself and every internal node (key copy plus inline value buffer) are allocated through the provided allocator, enabling fully deterministic memory behaviour suitable for embedded, real-time, or safety-regulated environments. The user can also develop their own allocators as long as they conform to the ``Allocator`` base class. **Recommended Allocators** ========================== Different allocators are appropriate depending on how dictionaries are used and how frequently elements are inserted and removed: * **HeapAllocator** Best for: * general-purpose hash maps * development and testing * workloads with unpredictable insertion and removal patterns The heap allocator is flexible and easy to debug, making it the default choice for most use cases. * **ArenaAllocator** Best for: * build-once dictionaries * read-only lookup tables * configuration or initialization data This is highly efficient when the dictionary is constructed once and not modified or individually freed. * **BuddyAllocator** Best for: * dynamic dictionaries with frequent insertions and deletions * systems requiring controlled fragmentation * long-running applications Buddy allocation helps manage fragmentation when many nodes are allocated and returned over time. * **PoolAllocator / SlabAllocator** Best for: * high-frequency insert/remove workloads * uniform key/value sizes * performance-critical systems Dictionary nodes are typically uniform in size, making them an excellent match for pool or slab allocators. This can significantly reduce allocation overhead and improve cache locality. **Factory Pattern and RAII** ============================= Unlike traditional constructors, dictionary creation uses a static factory function ``Dict::init()`` that returns ``Expected*>``. This design: * Prevents creation of uninitialized or invalid dictionaries * Provides explicit error handling at the point of creation * Enables proper two-phase initialization (struct allocation + bucket array allocation) * Returns typed error objects (``ArgumentError``, ``MemoryError``) with descriptive messages Memory cleanup is automatic through the ``DictDeleter`` custom deleter, which works seamlessly with ``UniquePtr`` to provide RAII semantics: .. code-block:: cpp // Automatic cleanup when UniquePtr goes out of scope { cslt::HeapAllocator allocator; auto result = cslt::Dict::init(8, true, allocator); if (result.hasValue()) { cslt::UniquePtr, cslt::DictDeleter> d(result.value()); d->insert(1, 1.1f); d->insert(2, 2.2f); d->insert(3, 3.3f); } // Dict, all nodes, and all key copies automatically freed here } **Template Type Requirements** ================================ ``Dict`` is parameterised on two independent types. **Key type K** must satisfy ``std::is_trivially_copyable_v``. This constraint is enforced at compile time via ``static_assert``. Raw-byte hashing operates on ``sizeof(K)`` bytes, which is only correct when the object representation is the complete value representation — a guarantee provided by trivial copyability. Any attempt to instantiate ``Dict`` with a non-trivially-copyable key type will produce a clear compile-time error. Key equality is resolved at compile time using a C++17 detection trait: * If ``K`` provides ``operator==``, it is used for equality comparison. * Otherwise, ``memcmp`` over ``sizeof(K)`` bytes is used as a fallback. **Value type V** must be default-constructible, copy-constructible, and destructible. Default-constructibility is required because ``get()`` and ``pop()`` declare a local ``Expected`` which default-constructs ``V`` before a value is available. Copy-constructibility is required because values are copy-constructed into the node's inline buffer on insert and update. Destructibility is required because the destructor is called explicitly when a node is freed. +----------------------------+--------------------------------------------------+ | Requirement | Reason | +============================+==================================================+ | K trivially copyable | Raw-byte hashing and memcmp equality | +----------------------------+--------------------------------------------------+ | V default-constructible | Expected default constructor | +----------------------------+--------------------------------------------------+ | V copy-constructible | Placement-new into node inline buffer | +----------------------------+--------------------------------------------------+ | V destructible | Explicit destructor call on node free | +----------------------------+--------------------------------------------------+ **Hash Function** ================= The dictionary uses a MurmurHash3-inspired hash function operating on the raw bytes of the key. The function processes the key in 4-byte blocks with a finalisation mixing step, producing well-distributed 32-bit hashes suitable for general-purpose use. The hash seed is fixed at ``0xdeadbeef``. Bucket indices are computed as ``hash(key) % bucket_count``, where ``bucket_count`` is always a power of two. Because the bucket count is a power of two a bitwise AND could replace the modulo; the modulo form is retained for clarity. **Growth Strategy** =================== When growth is enabled (``growth == true`` in ``init()``), the bucket array is resized automatically when the load factor exceeds 0.75. The growth strategy is tiered to limit memory overhead at large sizes: +------------------------+--------------------------------------------+ | Current bucket count | New bucket count | +========================+============================================+ | < 4,096 buckets | 2× current | +------------------------+--------------------------------------------+ | ≥ 4,096 buckets | current + 1,024 (linear increment) | +------------------------+--------------------------------------------+ The new bucket count is always rounded up to the next power of two. All existing nodes are rehashed into the new bucket array in a single pass; the old bucket array is freed via the allocator immediately afterwards. When growth is disabled (``growth == false``), ``insert()`` returns ``false`` when ``hash_size() >= bucket_count()``, leaving the caller in full control of memory allocation. **Node Layout** =============== Each key-value pair is stored in a single chain of nodes within a bucket. A node consists of: * A **node header** (``next`` pointer + ``key_data`` pointer) allocated as one block of ``sizeof(Node) + sizeof(V)`` bytes. * A **key copy** — a separate allocation of ``sizeof(K)`` bytes holding a ``memcpy`` of the caller's key. This ensures that the key's lifetime is tied to the node, not the caller's stack or buffer. * An **inline value buffer** — the ``sizeof(V)`` bytes immediately following the node header in the same allocation, populated via placement new. This layout means each insert performs exactly two allocator calls (node + key), and each remove performs exactly two frees. **Iteration** ============= The ``foreach()`` method accepts any callable with signature ``void(const K&, const V&)``, including lambdas, functors, and plain function pointers: .. code-block:: cpp cslt::HeapAllocator allocator; auto r = cslt::Dict::init(8, true, allocator); if (!r.hasValue()) return; cslt::UniquePtr, cslt::DictDeleter> d(r.value()); d->insert(1, 1.1f); d->insert(2, 2.2f); d->insert(3, 3.3f); // Lambda accumulator float total = 0.0f; d->foreach([&total](const int& key, const float& value) { total += value; }); Traversal order is bucket order, which is neither insertion order nor key order. The callback must not insert, update, remove, or clear entries during traversal. **Copy and Merge** ================== Two utility factory methods are provided for building new dictionaries from existing ones. ``copy()`` creates a fully independent deep copy. All nodes are copy-constructed into a fresh bucket array of the same capacity. A convenience single-argument overload reuses the source dictionary's own allocator: .. code-block:: cpp // Two-argument form — explicit allocator auto cr = cslt::Dict::copy(*src, allocator); // One-argument form — uses src's own allocator auto cr = cslt::Dict::copy(*src); ``merge()`` combines two dictionaries into a new one. All entries from the first source are inserted first, then entries from the second source are processed according to the ``overwrite`` flag: .. code-block:: cpp // overwrite == true: second dict's values win on key conflicts // overwrite == false: first dict's values are preserved on conflicts auto mr = cslt::Dict::merge(*a, *b, true, allocator); Neither source dictionary is modified by ``merge()``. **String Keys** =============== The most common use case for a dictionary is mapping string keys to values. ``cslt::String`` cannot be used directly as a key because its private destructor and deleted copy operations make it non-trivially copyable — a requirement enforced by ``Dict``'s ``static_assert``. ``cslt::StringKey`` solves this cleanly. It copies the string content into a fixed-size stack buffer of ``N`` bytes (including the null terminator), producing a type that is trivially copyable, zero-cost to hash, and requires no heap allocation. Strings longer than ``N-1`` characters are silently truncated, so choose ``N`` based on the longest key you expect to store. +-------------------+--------------------------------------------+ | Type | Recommended use | +===================+============================================+ | StringKey<32> | Short identifiers, sensor names, config | +-------------------+--------------------------------------------+ | StringKey<64> | Dictionary words, variable names | +-------------------+--------------------------------------------+ | StringKey<128> | Sentences, file names | +-------------------+--------------------------------------------+ | StringKey<256> | File paths, long descriptors | +-------------------+--------------------------------------------+ A type alias keeps call sites concise: .. code-block:: cpp using Key64 = cslt::StringKey<64>; **Basic string key usage:** .. code-block:: cpp #include "dict.hpp" using Key64 = cslt::StringKey<64>; cslt::HeapAllocator alloc; auto r = cslt::Dict::init(8, true, alloc); if (!r.hasValue()) return; cslt::UniquePtr, cslt::DictDeleter> d(r.value()); // Insert using C-string literals d->insert(Key64{"pi"}, 3.14159f); d->insert(Key64{"euler"}, 2.71828f); d->insert(Key64{"phi"}, 1.61803f); // Look up by C-string literal auto gr = d->get(Key64{"pi"}); if (gr.hasValue()) { float val = gr.value(); // 3.14159f } bool found = d->has_key(Key64{"euler"}); // true bool missing = d->has_key(Key64{"zero"}); // false **Constructing a key from a cslt::String:** .. code-block:: cpp #include "dict.hpp" #include "string.hpp" using Key64 = cslt::StringKey<64>; cslt::HeapAllocator alloc; auto r = cslt::Dict::init(8, true, alloc); if (!r.hasValue()) return; cslt::UniquePtr, cslt::DictDeleter> d(r.value()); d->insert(Key64{"temperature"}, 98.6f); // Build a lookup key from a runtime cslt::String auto rs = cslt::String::init("temperature", 0, alloc); if (rs.hasValue()) { cslt::UniquePtr s(rs.value()); auto gr = d->get(Key64{*s}); // construct key from String if (gr.hasValue()) { float temp = gr.value(); // 98.6f } } **Iterating over string-keyed entries:** .. code-block:: cpp d->foreach([](const Key64& key, const float& value) { // key.c_str() returns the null-terminated string // key.size() returns the logical length }); **Updating and removing string keys:** .. code-block:: cpp d->update(Key64{"pi"}, 3.14f); // overwrite existing value auto pr = d->pop(Key64{"euler"}); // remove and return value if (pr.hasValue()) { float e = pr.value(); // 2.71828f } StringKey Class =============== .. doxygenclass:: cslt::StringKey :members: Dict Class ========== .. doxygenclass:: cslt::Dict :members: