Enhancing Seseragi With Union Types And Updated ADT Syntax
This article delves into the proposed changes to the Seseragi type system, focusing on the introduction of union types and an update to the syntax for algebraic data types (ADTs). These enhancements aim to provide greater flexibility and expressiveness in defining types within the language. Let's explore the current state, proposed changes, implementation plan, and expected results of these significant updates.
Overview
This comprehensive proposal outlines two key enhancements to the Seseragi type system:
- Updating ADT Syntax: Modifying the current syntax for Algebraic Data Types (ADTs) to utilize leading pipe symbols.
- Introducing Union Types: Adding a new feature that enables the combination of existing types.
Current State of the Seseragi Type System
Before diving into the proposed changes, it's crucial to understand the existing type declaration syntax in Seseragi. Currently, the language supports ADTs, type aliases, and structs, each with its own distinct syntax.
Current Type Declaration Syntax
- ADT:
type Color = Red | Green | Blue
- Type Alias:
type UserId = Int
- Struct:
struct Point { x: Int, y: Int }
In the current system, ADTs are defined using the type
keyword followed by the type name, an equals sign, and a series of type constructors separated by pipe symbols. Type aliases, on the other hand, simply assign a new name to an existing type. Structs, similar to structures in other languages, define a composite type with named fields.
Parser Behavior
The Seseragi parser handles these different type declarations through specific methods. The typeDeclaration()
method is responsible for parsing constructs that begin with the type
keyword. Struct declarations are handled separately by the structDeclaration()
method. The distinction between ADTs and type aliases within the typeDeclaration()
method is determined by the presence or absence of pipe symbols.
Proposed Changes to Seseragi
The proposal introduces a revised syntax for ADTs and the addition of union types, significantly enhancing the type system's capabilities. These changes are designed to improve code readability, maintainability, and expressiveness.
1. ADT Syntax Update
The first proposed change involves updating the syntax for defining ADTs. The current syntax, which uses pipe symbols to separate type constructors, will be modified to use leading pipe symbols instead. This change aims to improve readability, especially for multi-line ADT definitions.
- Current:
type Color = Red | Green | Blue
- New:
type Color = | Red | Green | Blue
The new syntax offers a cleaner and more consistent way to define ADTs, particularly when dealing with multiple constructors. The leading pipe symbols visually align the constructors, making the structure of the ADT more apparent.
Multi-line Format
The benefits of the new syntax become even more pronounced when defining ADTs with numerous constructors spanning multiple lines. The following examples illustrate the improved readability of the multi-line format:
type Color =
| Red
| Green
| Blue
| Yellow
| Purple
type Number =
| Positive Int
| Negative Int
| Zero
2. Union Types: A New Feature
The second major proposal introduces union types as a new feature in Seseragi. Union types allow you to combine existing types, creating a new type that can hold values of any of the combined types. This feature significantly enhances the flexibility and expressiveness of the type system.
type ID = String | Int
type Response = Success | Error
type Value = String | Int | Bool
Union types are particularly useful in scenarios where a variable or function parameter can accept values of different types. For example, an ID
type might be either a String
or an Int
, allowing for different ways to identify an entity. Similarly, a Response
type could be either Success
or Error
, representing the outcome of an operation. Union types greatly improve the handling of diverse data types in a structured and type-safe manner.
Implementation Plan
The implementation of these changes will be carried out in a phased approach, ensuring a smooth transition and minimizing the risk of introducing errors. Each phase focuses on specific aspects of the implementation, from initial investigation to final testing and deployment.
Phase 1: Investigation & Safety
The initial phase is dedicated to thoroughly investigating the existing codebase and identifying potential impacts of the proposed changes. This phase is crucial for ensuring the safety and stability of the Seseragi ecosystem.
- [ ] Audit all existing ADT usage in codebase
- [ ] Identify pattern matching dependencies
- [ ] Map type inference impact areas
- [ ] Ensure all existing tests pass
This involves auditing existing ADT usage to understand how they are used throughout the codebase. Identifying pattern matching dependencies is crucial to ensure that existing pattern matching logic continues to work correctly after the syntax update. Mapping the impact on type inference helps to anticipate and address any potential issues in the type system. Finally, ensuring that all existing tests pass provides a baseline for evaluating the correctness of the changes.
Phase 2: Parser Updates (src/parser.ts)
Phase 2 focuses on modifying the parser to accommodate the new ADT syntax and recognize union types. This involves updating the src/parser.ts
file, which is responsible for parsing Seseragi code.
- [ ] Modify
typeDeclaration()
logic for new syntax - [ ] Maintain backward compatibility for existing
Red | Green
syntax - [ ] Add Union type detection logic
- [ ] Prioritize new
| Red | Green
syntax
The typeDeclaration()
logic needs to be updated to handle the leading pipe symbol in the new ADT syntax. Importantly, backward compatibility with the existing syntax must be maintained to allow for a gradual transition. New logic will be added to detect union types, and the parser will be configured to prioritize the new | Red | Green
syntax while still recognizing the old syntax.
Phase 3: AST Extension (src/ast.ts)
The Abstract Syntax Tree (AST) represents the structure of the code in a way that the compiler can understand. Phase 3 involves extending the AST to represent union types.
- [ ] Keep existing
TypeDeclaration
intact - [ ] Add new
UnionTypeDeclaration
class - [ ] Ensure no breaking changes to existing nodes
To minimize disruption, the existing TypeDeclaration
class will be kept intact. A new UnionTypeDeclaration
class will be added to represent union types in the AST. It's crucial to ensure that these changes do not introduce any breaking changes to existing AST nodes, maintaining compatibility with existing code.
Phase 4: Type Inference (src/type-inference.ts)
Type inference is a critical part of the Seseragi type system, allowing the compiler to automatically deduce the types of expressions. Phase 4 focuses on integrating union types into the type inference process.
- [ ] Add Union type constraint generation
- [ ] Preserve existing ADT inference logic
- [ ] Enhance type compatibility checking
This phase involves adding logic for generating type constraints for union types. It's essential to preserve the existing inference logic for ADTs to ensure that existing code continues to work correctly. The type compatibility checking mechanism will be enhanced to handle union types, ensuring that types are used correctly throughout the codebase.
Phase 5: Code Generation (src/codegen.ts)
The code generation phase translates the AST into executable code. In this case, Seseragi code is translated into TypeScript. Phase 5 focuses on adding code generation support for union types.
- [ ] Add Union type TypeScript output
- [ ] Maintain existing ADT generation
- [ ] Handle Discriminated Union vs Simple Union distinction
The code generator will be updated to output TypeScript code that correctly represents union types. Existing ADT generation logic will be maintained to ensure compatibility. A key consideration is the distinction between discriminated unions and simple unions, which require different code generation strategies. A discriminated union typically translates to an enum or a tagged union in TypeScript, while a simple union translates to a union type.
Phase 6: Testing Strategy
A robust testing strategy is essential to ensure the correctness and stability of the changes. Phase 6 outlines a comprehensive testing plan that covers various aspects of the implementation.
- Regression Protection: All existing tests must pass
- New Syntax Tests: ADT new syntax validation
- Union Type Tests: New feature comprehensive testing
- Integration Tests: End-to-end functionality verification
Regression testing ensures that existing functionality remains intact. New syntax tests specifically validate the new ADT syntax. Union type tests provide comprehensive coverage of the new feature. Integration tests verify the end-to-end functionality of the system, ensuring that all components work together correctly.
Phase 7: VS Code Extension
The VS Code extension provides language support for Seseragi, including syntax highlighting and language server features. Phase 7 focuses on updating the extension to support the new syntax and features.
- [ ] Update syntax highlighting
- [ ] Language server support
This involves updating the syntax highlighting rules to correctly color the new ADT syntax and union types. The language server will be updated to provide features like code completion, error checking, and go-to-definition for union types.
Backward Compatibility Strategy
Maintaining backward compatibility is a key concern throughout the implementation process. The goal is to allow existing Seseragi code to continue working without modification.
- Gradual Migration: Support old syntax during transition
- Comprehensive Testing: Verify all tests pass at each step
- Rollback Ready: Keep commits small and atomic
- Minimal Impact: Achieve goals with smallest possible changes
The strategy involves supporting the old ADT syntax during a transition period, allowing developers to gradually migrate to the new syntax. Comprehensive testing at each step ensures that changes do not introduce regressions. Keeping commits small and atomic makes it easier to roll back changes if necessary. The overall approach is to achieve the goals with the smallest possible changes to minimize the risk of disruption.
Expected Results
After the implementation is complete, the Seseragi type system will support both the new and old ADT syntax, as well as union types. The following examples illustrate the expected results:
// ADT (new syntax - recommended)
type Color = | Red | Green | Blue
// ADT (old syntax - compatibility maintained)
type Shape = Circle Float | Rectangle Float Float
// Union Types (new feature)
type ID = String | Int
type Response = Success | Error
// Type Alias (unchanged)
type UserId = Int
// Struct (unchanged)
struct Point { x: Int, y: Int }
These examples demonstrate the new ADT syntax, the maintained compatibility with the old syntax, the use of union types, and the unchanged syntax for type aliases and structs. These features will enhance the flexibility and expressiveness of Seseragi, making it a more powerful and versatile language.
Risk Mitigation
To mitigate potential risks during the implementation, several strategies are in place:
- Non-breaking Implementation: Existing code continues to work
- Incremental Rollout: Feature-by-feature implementation
- Extensive Testing: Regression and integration test coverage
- Community Feedback: Gather input before finalizing syntax
The non-breaking implementation ensures that existing code remains functional. The incremental rollout allows for a gradual introduction of the new features, making it easier to identify and address any issues. Extensive testing provides a safety net against regressions and integration problems. Gathering community feedback before finalizing the syntax helps to ensure that the changes meet the needs of Seseragi developers.
Related Files
Several key files will be modified as part of this implementation. These files are central to the parsing, type inference, code generation, and testing processes.
src/parser.ts
- Core parsing logicsrc/ast.ts
- AST node definitionssrc/type-inference.ts
- Type system integrationsrc/codegen.ts
- TypeScript generationtests/
- Comprehensive test coverageexamples/
- Update syntax examples- VS Code extension files
These files represent the core components of the Seseragi compiler and toolchain that will be affected by the proposed changes. Careful attention will be paid to these files to ensure that the implementation is correct and maintainable.
Definition of Done
The implementation will be considered complete when the following criteria are met:
- [ ] Both old and new ADT syntax work correctly
- [ ] Union types are fully implemented and tested
- [ ] All existing tests pass without modification
- [ ] New comprehensive test suite added
- [ ] VS Code extension supports new syntax
- [ ] Documentation updated with examples
- [ ] Performance impact assessed and acceptable
These criteria provide a clear and measurable definition of done, ensuring that the implementation is thorough and meets the goals of the proposal. The successful completion of these criteria will mark a significant step forward in the evolution of the Seseragi language.