Bug Report TVM FFI Segfault On MacBook Pro M4 ARM64 Metal

by StackCamp Team 58 views

Introduction

This article addresses a critical bug encountered while using the TVM FFI (Foreign Function Interface) on a MacBook Pro M4 ARM64. The issue manifests as a segmentation fault, specifically when following the QuickStart guide for mlc-llm. This article provides a detailed account of the bug, the steps to reproduce it, the expected behavior, and the environment in which the bug was observed. This information is crucial for developers and users of mlc-llm to understand and potentially resolve this issue. The article also aims to provide a comprehensive overview of the problem, ensuring that it is easily discoverable and understandable for anyone facing similar challenges.

🐛 Bug Description

The core issue is a segmentation fault triggered within the TVM FFI when running mlc-llm on a MacBook Pro with an M4 ARM64 chip. This error occurs after meticulously following the instructions provided in the official mlc-llm documentation for compiling models. Specifically, the error arises during the execution of a simple chat completion task using the MLCEngine class. The error message, !!!!!!! TVM FFI encountered a Segfault !!!!!!!, clearly indicates a low-level memory access violation within the TVM runtime. This type of error is particularly problematic as it suggests a fundamental incompatibility or misconfiguration within the system, making it essential to identify the root cause to prevent further occurrences. Understanding the context of this error requires a deep dive into the steps taken to reproduce it, the environment in which it occurs, and the expected behavior versus the actual outcome.

Detailed Explanation of the Bug

To elaborate further, the segmentation fault occurs during the interaction between the Python mlc-llm library and the compiled model library, specifically the Metal-accelerated version. This suggests a potential issue with either the model compilation process, the runtime environment, or the interaction between TVM and the Metal GPU framework on the M4 chip. The fact that the error surfaces during the chat completion task implies that the problem likely lies within the model execution phase, rather than the initial setup or loading of the model. This distinction is crucial for debugging, as it narrows down the potential areas of investigation. A segmentation fault is a serious error, often indicating a critical flaw in the software or its configuration, and resolving it typically requires a thorough understanding of the system's architecture and the software's internal workings. The detailed steps provided in the reproduction section are vital for pinpointing the exact stage at which the error occurs and the specific conditions that trigger it.

Steps to Reproduce

To replicate this bug, follow these steps meticulously. These steps mirror the process outlined in the mlc-llm QuickStart guide, ensuring that the issue can be consistently reproduced. Each step is critical, and any deviation might lead to different outcomes. Therefore, it’s essential to follow the instructions precisely.

  1. Refer to the Documentation: Begin by visiting the official mlc-llm documentation at https://llm.mlc.ai/docs/compilation/compile_models.html. This page provides a comprehensive guide on compiling models for mlc-llm, and it’s the starting point for setting up the environment and compiling the necessary components.

  2. Verify mlc_llm Installation: Confirm that the mlc_llm command-line tool is correctly installed and functioning by running mlc_llm --help. This step ensures that the basic mlc-llm tools are accessible and properly configured in your environment.

  3. Verify TVM Installation: Ensure that the TVM (Tensor Virtual Machine) is installed and accessible by executing the Python command python -c "import tvm; print(tvm.__file__)". This command prints the path to the TVM installation, confirming that TVM is correctly set up and can be imported in Python.

  4. Create Model Directory: Create the necessary directory structure using the command mkdir -p dist/models && cd dist/models. This step sets up the file system structure required for storing the model and related files.

  5. Install Git LFS: Install Git Large File Storage (LFS) to handle large model files by running git lfs install. This is crucial for managing the large model files associated with mlc-llm.

  6. Clone LLM Module: Clone the RedPajama-INCITE-Chat-3B-v1 model from Hugging Face using git clone https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-3B-v1. This downloads the pre-trained model weights, which are essential for running the chat application.

  7. Move to Root Folder: Navigate back to the root directory using cd ../... This ensures that subsequent commands are executed from the correct location.

  8. Convert Model Weights: Convert the model weights to the required quantization format using the command:

    mlc_llm convert_weight ./dist/models/RedPajama-INCITE-Chat-3B-v1/ \
        --quantization q4f16_1 \
        -o dist/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC
    

    This step converts the model weights to a lower precision format (q4f16_1), which is optimized for performance on the target hardware.

  9. Generate Configuration: Generate the configuration file for the model using:

    mlc_llm gen_config ./dist/models/RedPajama-INCITE-Chat-3B-v1/ \
        --quantization q4f16_1 --conv-template redpajama_chat \
        -o dist/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC/
    

    This command creates a configuration file tailored to the RedPajama-INCITE-Chat-3B-v1 model, specifying quantization settings and the conversation template.

  10. Compile the Model: Compile the model for the Metal backend using the command:

    mlc_llm compile ./dist/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC/mlc-chat-config.json \
        --device metal -o dist/libs/RedPajama-INCITE-Chat-3B-v1-q4f16_1-metal.so
    

    This is a crucial step that compiles the model into a Metal-optimized library, enabling GPU acceleration on Apple Silicon.

  11. Verify Library Generation: Confirm that the compiled library file (.so) has been generated by listing the contents of the dist/libs directory using ls dist/libs. This ensures that the compilation process has successfully produced the required library.

  12. Verify Configuration Files: Check the existence of the chat configuration file and tokenizer configuration in the dist/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC directory using ls dist/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC. This step verifies that all the necessary configuration files are in place.

  13. Execute the Chat Application: Run the chat application using the following command:

    mlc_llm chat dist/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC \
        --model-lib dist/libs/RedPajama-INCITE-Chat-3B-v1-q4f16_1-metal.so
    

    This command launches the chat application, loading the compiled model and library, and it is at this stage that the segmentation fault is encountered.

Code Sample and Error Messages

The following code snippet and error messages provide a clear picture of the issue:

Code to Reproduce:

from mlc_llm import MLCEngine
engine = MLCEngine(
    model="./dist/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC",
    model_lib="./dist/libs/RedPajama-INCITE-Chat-3B-v1-q4f16_1-metal.so"
)
engine.chat.completions.create(
    messages=[{"role": "user", "content": "hello"}]
)

Error Output:

mlc_llm chat dist/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC \
  --model-lib dist/libs/RedPajama-INCITE-Chat-3B-v1-q4f16_1-metal.so
[2025-07-07 17:49:04] INFO auto_device.py:90: Not found device: cuda:0
[2025-07-07 17:49:05] INFO auto_device.py:90: Not found device: rocm:0
[2025-07-07 17:49:06] INFO auto_device.py:79: Found device: metal:0
[2025-07-07 17:49:06] INFO auto_device.py:90: Not found device: vulkan:0
[2025-07-07 17:49:07] INFO auto_device.py:90: Not found device: opencl:0
[2025-07-07 17:49:08] INFO auto_device.py:79: Found device: cpu:0
[2025-07-07 17:49:08] INFO auto_device.py:35: Using device: metal:0
[2025-07-07 17:49:08] INFO engine_base.py:142: Using library model: dist/libs/RedPajama-INCITE-Chat-3B-v1-q4f16_1-metal.so
!!!!!!! TVM FFI encountered a Segfault !!!!!!!
zsh: segmentation fault  mlc_llm chat dist/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC --model-lib

The error output clearly shows that the segmentation fault occurs after the system identifies and selects the Metal device for execution. This suggests that the issue is likely related to the interaction between the mlc-llm runtime and the Metal framework, or possibly a problem within the compiled Metal-specific library.

🤔 Expected Behavior

The expected behavior after executing the chat application command is a seamless launch of a chat dialog. The user should be presented with an interactive interface where they can input prompts and receive responses from the RedPajama-INCITE-Chat-3B-v1 model. This implies that the model should load correctly, the necessary runtime components should initialize without errors, and the chat interface should be fully functional. The absence of a segmentation fault and the successful initiation of the chat dialog are the key indicators of the expected outcome. Deviations from this behavior, such as the observed crash, signify a problem that needs to be addressed to ensure the correct functioning of the application.

⚙️ Environment

Understanding the environment in which the bug occurs is crucial for effective debugging. Here’s a detailed breakdown of the system configuration:

  • Platform: Metal
  • Operating System: MacOS
  • Device: M4 Pro
  • MLC-LLM Installation: Installed using python -m pip install --pre -U -f https://mlc.ai/wheels mlc-llm-nightly-cpu mlc-ai-nightly-cpu
  • TVM-Unity Installation: Installed using python -m pip install --pre -U -f https://mlc.ai/wheels mlc-ai-nightly-cpu
  • Python Version: 3.11.13
  • GPU Driver Version: N/A (integrated GPU on M4 Pro)
  • CUDA/cuDNN Version: N/A
  • TVM Unity Hash Tag: The output of python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))" provides a detailed snapshot of the TVM build configuration. Key aspects include:
    • BUILD_STATIC_RUNTIME: OFF
    • LLVM_VERSION: 15.0.7
    • USE_METAL: ON
    • USE_LLVM: llvm-config --link-static
    • USE_BLAS: none

This environment configuration highlights the use of the Metal backend on an M4 Pro chip, with nightly builds of mlc-llm and TVM-Unity. The absence of CUDA/cuDNN and the specific versions of LLVM and other dependencies provide valuable context for diagnosing the issue. The TVM Unity Hash Tag output is particularly useful, as it details the build-time configurations that may influence the runtime behavior of TVM and mlc-llm.

Additional Context

The segmentation fault encountered on the M4 Pro MacBook Pro suggests a potential issue with the interaction between the mlc-llm framework and the Apple Silicon architecture, specifically the Metal GPU framework. This could be due to a variety of factors, including:

  • Incompatibilities between TVM and Metal: There might be underlying issues in how TVM is interacting with the Metal API on the M4 Pro chip. This could stem from driver issues, incorrect memory management, or misaligned data structures.
  • Compiler Optimizations: The LLVM compiler, used by TVM, may be generating code that is not fully optimized for the M4 Pro architecture, leading to runtime errors.
  • Quantization Issues: The q4f16_1 quantization format might be introducing numerical instabilities or memory access issues on the M4 Pro.
  • Library Dependencies: The issue could be related to specific library dependencies used by mlc-llm or TVM, particularly those that interface with the GPU.

Further investigation is needed to pinpoint the exact cause, potentially involving debugging the TVM runtime, examining the generated Metal shader code, and testing different quantization formats. The detailed steps to reproduce and the comprehensive environment information provided in this report are crucial for such investigations. Understanding the interplay between the software stack and the hardware architecture is key to resolving this segmentation fault and ensuring the robust performance of mlc-llm on Apple Silicon.

SEO Title

Bug Report TVM FFI Segfault on MacBook Pro M4 ARM64 Metal

Repair Input Keyword

Fix the TVM FFI segmentation fault on MacBook Pro M4 ARM64, steps to reproduce the segfault, expected behavior and environment details, mlc-llm bug report, debugging mlc-llm on Metal.