Rename `_null` Column To `_deleted` Tombstone Flag For Clarity

August 11, 2025 by StackCamp Team 63 views

Renaming `_null` Column to `_deleted` Tombstone Flag

In the realm of database management, clarity and consistency are paramount. One significant area where these principles come into play is in the naming of columns, especially those that serve critical functions like marking records for deletion. This article delves into a proposal to rename the _null column to _deleted within the Tonbo framework, exploring the rationale behind this change, the benefits it offers, and the potential alternatives considered. Let's dive in and explore why this seemingly small change can have a profound impact on the usability and maintainability of a database system.

The Current Issue: Conceptual Confusion with `_null`

Currently, a reserved boolean column named _null acts as a row-level tombstone. In database terminology, a tombstone is a marker indicating that a record has been logically deleted, even though it might still physically exist in the storage. This approach is common in systems that prioritize performance and availability, allowing for deletions to be processed asynchronously without immediately rewriting large amounts of data. However, the name _null presents a significant problem: it clashes conceptually with the notion of per-column nullability. Guys, think about it: a column being nullable means it can contain a NULL value, representing missing or unknown data. This is distinct from a row being marked as deleted.

This naming conflict leads to several issues. First and foremost, it makes SQL queries awkward to read and understand. Imagine trying to write a query that filters out deleted rows: you'd have to use WHERE NOT _null, which isn't immediately intuitive. The name doesn’t clearly convey the column’s purpose as a deletion marker. Moreover, the existing codebase hardcodes the string "_null" in various places, including schemas, arrays, macros, tests, and examples. This widespread usage makes refactoring and future changes more challenging. The current implementation, while functional, lacks the clarity needed for a robust and maintainable system. We need a name that accurately reflects the column's purpose and avoids confusion with other database concepts. This is the core of the problem we're addressing. The lack of clarity can lead to errors, increased development time, and difficulty in understanding the system's behavior. Therefore, a change is necessary to improve the overall user experience and maintainability of the codebase.

The Proposed Solution: Renaming to `_deleted`

To address the issues with the _null column name, the proposed solution is to rename it to _deleted. This name directly reflects the column's function as a marker for deleted rows. Along with the renaming, the proposal includes introducing a DELETED constant for programmatic use. This constant would serve as a standardized way to refer to the deleted state, further enhancing code clarity and consistency.

Specifically, the plan involves updating dynamic schema builders and macros to emit Field::new("_deleted", DataType::Boolean, false) as column 0. This ensures that the new column is correctly defined in the database schema. Importantly, the underlying storage layout (USER_COLUMN_OFFSET = 2) and semantics will remain unchanged. This means that the renaming will not require any data migration or changes to the way data is stored on disk. This is a crucial consideration, as it minimizes the disruption caused by the change and ensures backward compatibility.

To facilitate a smooth transition, the proposal includes adding read-side aliasing. This means that the system will accept either _deleted or the legacy _null in queries. However, if both are present, an error will be raised to prevent ambiguity. Additionally, a deprecation warning will be emitted when _null is used, encouraging users to switch to the new _deleted name. For one release cycle, all writes will default to using _deleted. This provides a grace period for users to adapt to the change while ensuring that new data is written using the new column name. This phased approach is designed to minimize disruption and ensure a smooth transition for existing users. The goal is to provide a clear and consistent way to mark deleted rows, improving the overall usability of the system. The introduction of the DELETED constant further reinforces this consistency, providing a standardized way to refer to the deleted state in code.

Improvements Gained: Clarity, Ergonomics, and Consistency

The renaming of the _null column to _deleted brings several key improvements. First and foremost, it enhances clarity. The name _deleted leaves no room for confusion regarding the column's purpose. It clearly indicates that the column marks rows that have been deleted, eliminating the ambiguity associated with _null. This clarity extends to SQL queries, making them more readable and easier to understand. For example, a query to filter out deleted rows would now use the intuitive WHERE NOT _deleted condition.

Secondly, the change improves query ergonomics. The more descriptive name makes it easier to write and understand queries related to deleted rows. The condition WHERE NOT _deleted is far more natural and self-explanatory than WHERE NOT _null. This improved ergonomics can lead to faster development times and fewer errors. Imagine the ease with which developers can now construct queries, reducing the cognitive load and potential for mistakes. This seemingly small change can have a significant impact on productivity and code quality.

Finally, the renaming promotes consistency with established terminology. The term

The Current Issue: Conceptual Confusion with _null

The Proposed Solution: Renaming to _deleted

Improvements Gained: Clarity, Ergonomics, and Consistency

The Current Issue: Conceptual Confusion with `_null`

The Proposed Solution: Renaming to `_deleted`