Rename `_null` Column To `_deleted` Tombstone Flag For Clarity
In the realm of database management, clarity and consistency are paramount. One significant area where these principles come into play is in the naming of columns, especially those that serve critical functions like marking records for deletion. This article delves into a proposal to rename the _null
column to _deleted
within the Tonbo framework, exploring the rationale behind this change, the benefits it offers, and the potential alternatives considered. Let's dive in and explore why this seemingly small change can have a profound impact on the usability and maintainability of a database system.
The Current Issue: Conceptual Confusion with _null
Currently, a reserved boolean column named _null
acts as a row-level tombstone. In database terminology, a tombstone is a marker indicating that a record has been logically deleted, even though it might still physically exist in the storage. This approach is common in systems that prioritize performance and availability, allowing for deletions to be processed asynchronously without immediately rewriting large amounts of data. However, the name _null
presents a significant problem: it clashes conceptually with the notion of per-column nullability. Guys, think about it: a column being nullable means it can contain a NULL
value, representing missing or unknown data. This is distinct from a row being marked as deleted.
This naming conflict leads to several issues. First and foremost, it makes SQL queries awkward to read and understand. Imagine trying to write a query that filters out deleted rows: you'd have to use WHERE NOT _null
, which isn't immediately intuitive. The name doesn’t clearly convey the column’s purpose as a deletion marker. Moreover, the existing codebase hardcodes the string "_null"
in various places, including schemas, arrays, macros, tests, and examples. This widespread usage makes refactoring and future changes more challenging. The current implementation, while functional, lacks the clarity needed for a robust and maintainable system. We need a name that accurately reflects the column's purpose and avoids confusion with other database concepts. This is the core of the problem we're addressing. The lack of clarity can lead to errors, increased development time, and difficulty in understanding the system's behavior. Therefore, a change is necessary to improve the overall user experience and maintainability of the codebase.
The Proposed Solution: Renaming to _deleted
To address the issues with the _null
column name, the proposed solution is to rename it to _deleted
. This name directly reflects the column's function as a marker for deleted rows. Along with the renaming, the proposal includes introducing a DELETED
constant for programmatic use. This constant would serve as a standardized way to refer to the deleted state, further enhancing code clarity and consistency.
Specifically, the plan involves updating dynamic schema builders and macros to emit Field::new("_deleted", DataType::Boolean, false)
as column 0. This ensures that the new column is correctly defined in the database schema. Importantly, the underlying storage layout (USER_COLUMN_OFFSET = 2
) and semantics will remain unchanged. This means that the renaming will not require any data migration or changes to the way data is stored on disk. This is a crucial consideration, as it minimizes the disruption caused by the change and ensures backward compatibility.
To facilitate a smooth transition, the proposal includes adding read-side aliasing. This means that the system will accept either _deleted
or the legacy _null
in queries. However, if both are present, an error will be raised to prevent ambiguity. Additionally, a deprecation warning will be emitted when _null
is used, encouraging users to switch to the new _deleted
name. For one release cycle, all writes will default to using _deleted
. This provides a grace period for users to adapt to the change while ensuring that new data is written using the new column name. This phased approach is designed to minimize disruption and ensure a smooth transition for existing users. The goal is to provide a clear and consistent way to mark deleted rows, improving the overall usability of the system. The introduction of the DELETED
constant further reinforces this consistency, providing a standardized way to refer to the deleted state in code.
Improvements Gained: Clarity, Ergonomics, and Consistency
The renaming of the _null
column to _deleted
brings several key improvements. First and foremost, it enhances clarity. The name _deleted
leaves no room for confusion regarding the column's purpose. It clearly indicates that the column marks rows that have been deleted, eliminating the ambiguity associated with _null
. This clarity extends to SQL queries, making them more readable and easier to understand. For example, a query to filter out deleted rows would now use the intuitive WHERE NOT _deleted
condition.
Secondly, the change improves query ergonomics. The more descriptive name makes it easier to write and understand queries related to deleted rows. The condition WHERE NOT _deleted
is far more natural and self-explanatory than WHERE NOT _null
. This improved ergonomics can lead to faster development times and fewer errors. Imagine the ease with which developers can now construct queries, reducing the cognitive load and potential for mistakes. This seemingly small change can have a significant impact on productivity and code quality.
Finally, the renaming promotes consistency with established terminology. The term