{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "18655ab8",
   "metadata": {},
   "source": [
    "# Qwen3-8B Full-Parameter Fine-Tuning Verification\n",
    "\n",
    "This notebook verifies the fine-tuning capability of the **Ascend 910B CANN image** by running full-parameter SFT fine-tuning for Qwen3-8B with MindSpeed-LLM.\n",
    "\n",
    "**Workflow:**\n",
    "1. Environment check\n",
    "2. Prepare a sample dataset (Alpaca format)\n",
    "3. Clone the MindSpeed-LLM scripts\n",
    "4. Convert HF weights to Megatron weights\n",
    "5. Preprocess the data\n",
    "6. Start fine-tuning\n",
    "7. Run inference validation\n",
    "\n",
    "> The training parameters are set for verification mode (few iterations + short sequence length). Increase `TRAIN_ITERS` and `SEQ_LENGTH` for production use."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12b48017",
   "metadata": {},
   "source": [
    "## 0. Parameter Configuration"
   ]
  },
  {
   "cell_type": "code",
   "id": "a0fa2576",
   "metadata": {},
   "source": "import warnings\nwarnings.filterwarnings('ignore', category=DeprecationWarning)\nwarnings.filterwarnings('ignore', category=ImportWarning)\nwarnings.filterwarnings('ignore', category=UserWarning)\n\nfrom pathlib import Path\n\n# ===== Path configuration =====\nHF_MODEL_DIR = Path('/opt/app-root/src/models/Qwen3-8B')\nWORK_DIR = Path('/opt/app-root/src/Qwen3-8B-work-dir')\nMINDSPEED_LLM_DIR = WORK_DIR / 'MindSpeed-LLM'\nDATA_DIR = WORK_DIR / 'finetune_dataset'\nRAW_DATA_FILE = DATA_DIR / 'alpaca_sample.jsonl'\nPROCESSED_DATA_PREFIX = DATA_DIR / 'alpaca'\nOUTPUT_DIR = WORK_DIR / 'output' / 'qwen3_8b_finetuned'\nLOGS_DIR = WORK_DIR / 'logs'\n\n# ===== Optional: real dataset path =====\nALPACA_PARQUET = Path('/opt/app-root/src/datasets/alpaca/train-00000-of-00001-a09b74b3ef9c3b56.parquet')\n\n# ===== Ascend environment scripts =====\nCANN_ENV = '/usr/local/Ascend/cann/set_env.sh'\nATB_ENV = '/usr/local/Ascend/nnal/atb/set_env.sh'\n\n# ===== Parallelism configuration (must match weight conversion) =====\nTP = 2   # With TP=1, one card holds about 4.1B parameters; fp32 gradient buffers + bf16 weights require about 30 GiB, exceeding the 910B 29 GiB memory limit\nPP = 2   # At least TPxPP=4 NPUs are required; for a single card, set TP=1 and PP=1 (OOM is possible)\n\n# ===== Weight conversion output (path includes parallel settings to avoid reusing stale weights after TP/PP changes) =====\nMCORE_WEIGHTS_DIR = WORK_DIR / 'model_weights' / f'qwen3_mcore_tp{TP}_pp{PP}'\n\n# ===== Training hyperparameters (verification mode) =====\nSEQ_LENGTH = 512     # 4096 is recommended for production\nTRAIN_ITERS = 50     # 2000+ is recommended for production\nMBS = 1\nLR = 1.25e-6\nMIN_LR = 1.25e-7\n\n# ===== Data preprocessing =====\nHANDLER_NAME = 'AlpacaStyleInstructionHandler'\nTOKENIZER_TYPE = 'PretrainedFromHF'\nPROMPT_TYPE = 'qwen3'\nENABLE_THINKING = 'none'\n\nprint('Configuration loaded')\nprint(f'  Model: {HF_MODEL_DIR}')\nprint(f'  Dataset: {ALPACA_PARQUET}' if ALPACA_PARQUET.exists() else '  Dataset: not found, using built-in sample data')\nprint(f'  TP={TP}, PP={PP}, SEQ={SEQ_LENGTH}, ITERS={TRAIN_ITERS}')",
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "15d10a9a",
   "metadata": {},
   "source": [
    "## Helper Function"
   ]
  },
  {
   "cell_type": "code",
   "id": "7eb53b45",
   "metadata": {},
   "source": "import os\nimport subprocess\n\n_SUPPRESS_WARNINGS = 'ignore::DeprecationWarning,ignore::ImportWarning,ignore::UserWarning'\n\ndef run_cmd(cmd, cwd=None, check=True):\n    'Run a bash command in the Ascend environment and stream output in real time'\n    env_prefix = f'source {CANN_ENV} && source {ATB_ENV}'\n    full_cmd = f'{env_prefix} && {cmd}'\n    print(f'$ {cmd}\\n')\n    run_env = os.environ.copy()\n    run_env['PYTHONWARNINGS'] = _SUPPRESS_WARNINGS\n    result = subprocess.run(\n        ['bash', '-lc', full_cmd],\n        cwd=str(cwd or WORK_DIR),\n        text=True,\n        env=run_env,\n    )\n    if check and result.returncode != 0:\n        raise RuntimeError(f'Command failed with return code: {result.returncode}')\n    return result\n\nprint('Helper function defined: run_cmd()')",
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "0d2cbf3b",
   "metadata": {},
   "source": [
    "## 1. Environment Check"
   ]
  },
  {
   "cell_type": "code",
   "id": "1643dfe5",
   "metadata": {},
   "source": "import warnings\nwith warnings.catch_warnings():\n    warnings.simplefilter('ignore', DeprecationWarning)\n    warnings.simplefilter('ignore', ImportWarning)\n    warnings.simplefilter('ignore', UserWarning)\n    import torch\n    import torch_npu\n\nprint('=' * 60)\nprint('Environment Check')\nprint('=' * 60)\n\n# PyTorch & NPU\nprint(f'PyTorch:    {torch.__version__}')\nprint(f'torch_npu:  {torch_npu.__version__}')\nnproc = torch.npu.device_count()\nprint(f'NPU count:  {nproc}')\nfor i in range(nproc):\n    print(f'  NPU {i}: {torch.npu.get_device_name(i)}')\n\n# MindSpeed\nwith warnings.catch_warnings():\n    warnings.simplefilter('ignore', DeprecationWarning)\n    warnings.simplefilter('ignore', ImportWarning)\n    warnings.simplefilter('ignore', UserWarning)\n    import mindspeed\n    import mindspeed_llm\nprint('MindSpeed:     installed')\nprint('MindSpeed-LLM: installed')\n\n# Model files\nprint(f'\\nModel directory: {HF_MODEL_DIR}')\nassert HF_MODEL_DIR.exists(), f'Model directory does not exist: {HF_MODEL_DIR}'\nmodel_files = sorted(HF_MODEL_DIR.glob('*'))\nfor f in model_files[:5]:\n    if f.is_file():\n        print(f'  {f.name} ({f.stat().st_size / 1e9:.2f} GB)')\nif len(model_files) > 5:\n    print(f'  ... {len(model_files)} files in total')\n\n# Parallelism validation\nassert nproc >= TP * PP, f'NPU count ({nproc}) < TP*PP ({TP*PP}); reduce PP'\nDP = nproc // (TP * PP)\nGBS = DP * MBS\nprint(f'\\nParallelism: TP={TP}, PP={PP}, DP={DP}, GBS={GBS}')\nassert torch.npu.is_available(), 'NPU is not available'\nprint('\\nEnvironment check passed!')",
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "a194e018",
   "metadata": {},
   "source": [
    "## 2. Prepare a Sample Dataset\n",
    "\n",
    "Create sample data in Alpaca format to verify the fine-tuning workflow.\n",
    "\n",
    "To use a real dataset, place a JSONL file at `RAW_DATA_FILE`, with one JSON object per line:\n",
    "```json\n",
    "{\"instruction\": \"...\", \"input\": \"...\", \"output\": \"...\"}\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "id": "6d845761",
   "metadata": {},
   "source": "import json\nimport warnings\nimport pandas as pd\n\nDATA_DIR.mkdir(parents=True, exist_ok=True)\n\nif ALPACA_PARQUET.exists():\n    print(f'Loading Alpaca dataset: {ALPACA_PARQUET.name}')\n    with warnings.catch_warnings():\n        warnings.simplefilter('ignore', DeprecationWarning)\n        df = pd.read_parquet(ALPACA_PARQUET)\n    print(f'{len(df)} samples loaded, columns: {list(df.columns)}')\n\n    # Convert to JSONL (instruction / input / output)\n    with open(RAW_DATA_FILE, 'w', encoding='utf-8') as f:\n        for item in df[['instruction', 'input', 'output']].to_dict('records'):\n            item['input'] = item.get('input') or ''\n            f.write(json.dumps(item, ensure_ascii=False) + '\\n')\n\n    print(f'Converted to JSONL: {RAW_DATA_FILE}')\n    print('\\nSample records:')\n    for item in df[['instruction', 'input', 'output']].head(3).to_dict('records'):\n        inp = f' {item[\"input\"]}' if item['input'] else ''\n        print(f'  Q: {item[\"instruction\"][:80]}{inp[:40]}')\n        print(f'  A: {str(item[\"output\"])[:80]}')\nelse:\n    print('Alpaca dataset not found, using built-in sample data\\n')\n    sample_data = [\n        {'instruction': 'Translate the following sentence into French', 'input': 'The weather is nice today.', 'output': \"Il fait beau aujourd'hui.\"},\n        {'instruction': 'Translate the following sentence into Spanish', 'input': 'I like programming.', 'output': 'Me gusta programar.'},\n        {'instruction': 'Summarize the sentence in one short phrase', 'input': 'Machine learning is fascinating and widely used in many fields.', 'output': 'Machine learning is broadly useful.'},\n        {'instruction': 'Rewrite the sentence in a more formal tone', 'input': 'Hello, how are you?', 'output': 'Hello, how are you doing today?'},\n        {'instruction': 'Introduce Python in one sentence', 'input': '', 'output': 'Python is a high-level general-purpose programming language known for its readability and rich ecosystem.'},\n        {'instruction': 'List three common sorting algorithms', 'input': '', 'output': 'Three common sorting algorithms are bubble sort, quicksort, and merge sort.'},\n        {'instruction': 'Explain what deep learning is', 'input': '', 'output': 'Deep learning is a branch of machine learning that uses multi-layer neural networks to learn hierarchical representations of data.'},\n        {'instruction': 'Write a Python function to add two numbers', 'input': '', 'output': 'def add(a, b):\\n    return a + b'},\n        {'instruction': 'Rewrite the sentence to be more concise', 'input': 'Artificial intelligence is changing the world.', 'output': 'AI is transforming the world.'},\n        {'instruction': 'What is a GPU?', 'input': '', 'output': 'A GPU is a graphics processing unit designed to accelerate highly parallel computation, especially for training and inference workloads.'},\n    ]\n    with open(RAW_DATA_FILE, 'w', encoding='utf-8') as f:\n        for item in sample_data:\n            f.write(json.dumps(item, ensure_ascii=False) + '\\n')\n    print(f'Sample dataset created: {RAW_DATA_FILE}')\n    print(f'{len(sample_data)} samples in total')",
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "9c4692a2",
   "metadata": {},
   "source": [
    "## 3. Clone MindSpeed-LLM\n",
    "\n",
    "The `mindspeed_llm` Python package is already installed in the image, but the training scripts (`convert_ckpt_v2.py`, `preprocess_data.py`, `posttrain_gpt.py`, and others) must be run from the repository directory."
   ]
  },
  {
   "cell_type": "code",
   "id": "511c1c4d",
   "metadata": {},
   "source": [
    "if MINDSPEED_LLM_DIR.exists():\n",
    "    print(f'Already exists: {MINDSPEED_LLM_DIR}')\n",
    "else:\n",
    "    print('Cloning MindSpeed-LLM (shallow clone)...')\n",
    "    run_cmd(f'git clone --depth 1 https://gitcode.com/ascend/MindSpeed-LLM.git {MINDSPEED_LLM_DIR}')\n",
    "\n",
    "# Validate required scripts\n",
    "scripts = [\n",
    "    ('Weight conversion', 'convert_ckpt_v2.py'),\n",
    "    ('Data preprocessing', 'preprocess_data.py'),\n",
    "    ('Fine-tuning', 'posttrain_gpt.py'),\n",
    "    ('Inference', 'inference.py'),\n",
    "]\n",
    "for name, script in scripts:\n",
    "    exists = (MINDSPEED_LLM_DIR / script).exists()\n",
    "    print(f'  [{name}] {script}: {\"OK\" if exists else \"MISSING\"}')\n",
    "\n",
    "assert all((MINDSPEED_LLM_DIR / s).exists() for _, s in scripts), 'Required scripts are missing'\n",
    "print('\\nScript check passed!')"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "331e0d10",
   "metadata": {},
   "source": [
    "## 4. HF Weight to Megatron Weight Conversion\n",
    "\n",
    "Convert HuggingFace-format weights to Megatron format, split by TP/PP. The first conversion usually takes about 5-10 minutes."
   ]
  },
  {
   "cell_type": "code",
   "id": "463dd7da",
   "metadata": {},
   "source": [
    "MCORE_WEIGHTS_DIR.mkdir(parents=True, exist_ok=True)\n",
    "\n",
    "# Check whether conversion has already been completed\n",
    "converted = any(MCORE_WEIGHTS_DIR.glob('iter_*'))\n",
    "\n",
    "if converted:\n",
    "    print(f'Weights already exist, skipping conversion: {MCORE_WEIGHTS_DIR}')\n",
    "    for p in sorted(MCORE_WEIGHTS_DIR.iterdir()):\n",
    "        print(f'  {p.name}')\n",
    "else:\n",
    "    convert_cmd = ' && '.join([\n",
    "        f'cd {MINDSPEED_LLM_DIR}',\n",
    "        f'python convert_ckpt_v2.py'\n",
    "        ' --load-model-type hf'\n",
    "        ' --save-model-type mg'\n",
    "        f' --target-tensor-parallel-size {TP}'\n",
    "        f' --target-pipeline-parallel-size {PP}'\n",
    "        f' --load-dir {HF_MODEL_DIR}'\n",
    "        f' --save-dir {MCORE_WEIGHTS_DIR}'\n",
    "        ' --model-type-hf qwen3',\n",
    "    ])\n",
    "    print('Running weight conversion (about 5-10 minutes)...')\n",
    "    run_cmd(convert_cmd, cwd=MINDSPEED_LLM_DIR)\n",
    "    print('Weight conversion completed!')\n",
    "    for p in sorted(MCORE_WEIGHTS_DIR.iterdir()):\n",
    "        print(f'  {p.name}')"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "419d028a",
   "metadata": {},
   "source": [
    "## 5. Data Preprocessing\n",
    "\n",
    "Convert Alpaca-format JSONL data into the binary format required by MindSpeed-LLM training."
   ]
  },
  {
   "cell_type": "code",
   "id": "f68febbf",
   "metadata": {},
   "source": [
    "preprocess_cmd = ' && '.join([\n",
    "    f'cd {MINDSPEED_LLM_DIR}',\n",
    "    f'python preprocess_data.py'\n",
    "    f' --input {RAW_DATA_FILE}'\n",
    "    f' --tokenizer-name-or-path {HF_MODEL_DIR}'\n",
    "    f' --output-prefix {PROCESSED_DATA_PREFIX}'\n",
    "    f' --handler-name {HANDLER_NAME}'\n",
    "    f' --tokenizer-type {TOKENIZER_TYPE}'\n",
    "    ' --workers 4'\n",
    "    ' --log-interval 1'\n",
    "    f' --enable-thinking {ENABLE_THINKING}'\n",
    "    f' --prompt-type {PROMPT_TYPE}',\n",
    "])\n",
    "\n",
    "print('Running data preprocessing...')\n",
    "run_cmd(preprocess_cmd, cwd=MINDSPEED_LLM_DIR)\n",
    "\n",
    "# Verify outputs\n",
    "print('\\nPreprocessing outputs:')\n",
    "for f in sorted(PROCESSED_DATA_PREFIX.parent.glob('alpaca*')):\n",
    "    print(f'  {f.name} ({f.stat().st_size / 1024:.1f} KB)')\n",
    "print('Data preprocessing completed!')"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "67501275",
   "metadata": {},
   "source": [
    "## 6. Start Fine-Tuning\n",
    "\n",
    "Run full-parameter SFT fine-tuning with MindSpeed-LLM. Training logs are streamed to the notebook in real time.\n",
    "\n",
    "> In verification mode, `TRAIN_ITERS=50`. For a full fine-tuning run, 2000+ iterations are recommended."
   ]
  },
  {
   "cell_type": "code",
   "id": "16c0ef7e",
   "metadata": {},
   "source": [
    "import torch\n",
    "\n",
    "nproc = torch.npu.device_count()\n",
    "DP = nproc // (TP * PP)\n",
    "GBS = DP * MBS\n",
    "\n",
    "LOGS_DIR.mkdir(parents=True, exist_ok=True)\n",
    "OUTPUT_DIR.mkdir(parents=True, exist_ok=True)\n",
    "\n",
    "# Environment variables\n",
    "env = ' && '.join([\n",
    "    f'cd {MINDSPEED_LLM_DIR}',\n",
    "    'export CUDA_DEVICE_MAX_CONNECTIONS=1',\n",
    "    'export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True',\n",
    "])\n",
    "\n",
    "# Distributed torchrun arguments\n",
    "distributed = ' '.join([\n",
    "    'torchrun',\n",
    "    f'--nproc_per_node {nproc}',\n",
    "    '--nnodes 1 --node_rank 0',\n",
    "    '--master_addr localhost --master_port 6000',\n",
    "])\n",
    "\n",
    "# Model architecture\n",
    "model_args = ' '.join([\n",
    "    '--use-mcore-models',\n",
    "    '--spec mindspeed_llm.tasks.models.spec.qwen3_spec layer_spec',\n",
    "    '--kv-channels 128 --qk-layernorm',\n",
    "    f'--tensor-model-parallel-size {TP}',\n",
    "    f'--pipeline-model-parallel-size {PP}',\n",
    "    '--sequence-parallel --use-distributed-optimizer --use-flash-attn',\n",
    "    '--num-layers 36 --hidden-size 4096 --num-attention-heads 32',\n",
    "    '--ffn-hidden-size 12288 --max-position-embeddings 32768',\n",
    "    f'--seq-length {SEQ_LENGTH}',\n",
    "    '--make-vocab-size-divisible-by 1 --padded-vocab-size 151936',\n",
    "    '--rotary-base 1000000 --use-rotary-position-embeddings',\n",
    "])\n",
    "\n",
    "# Training hyperparameters\n",
    "train_args = ' '.join([\n",
    "    f'--micro-batch-size {MBS} --global-batch-size {GBS}',\n",
    "    '--disable-bias-linear --swiglu',\n",
    "    f'--train-iters {TRAIN_ITERS}',\n",
    "    '--tokenizer-type PretrainedFromHF',\n",
    "    f'--tokenizer-name-or-path {HF_MODEL_DIR}',\n",
    "    '--normalization RMSNorm --position-embedding-type rope',\n",
    "    '--norm-epsilon 1e-6 --hidden-dropout 0 --attention-dropout 0',\n",
    "    '--no-gradient-accumulation-fusion --attention-softmax-in-fp32',\n",
    "    '--exit-on-missing-checkpoint --no-masked-softmax-fusion',\n",
    "    '--group-query-attention --untie-embeddings-and-output-weights',\n",
    "    '--num-query-groups 8',\n",
    "    f'--min-lr {MIN_LR} --lr {LR}',\n",
    "    '--weight-decay 1e-1 --clip-grad 1.0',\n",
    "    '--adam-beta1 0.9 --adam-beta2 0.95 --initial-loss-scale 4096',\n",
    "    '--no-load-optim --no-load-rng --seed 42 --bf16',\n",
    "])\n",
    "\n",
    "# Data and outputs\n",
    "data_args = ' '.join([\n",
    "    f'--data-path {PROCESSED_DATA_PREFIX}',\n",
    "    '--split 100,0,0',\n",
    "    '--log-interval 1',\n",
    "    f'--save-interval {TRAIN_ITERS}',\n",
    "    f'--eval-interval {TRAIN_ITERS} --eval-iters 0',\n",
    "])\n",
    "\n",
    "# Fine-tuning configuration\n",
    "tune_args = ' '.join([\n",
    "    '--finetune --stage sft --is-instruction-dataset',\n",
    "    '--prompt-type qwen3 --no-pad-to-seq-lengths',\n",
    "    '--distributed-backend nccl',\n",
    "    f'--load {MCORE_WEIGHTS_DIR} --save {OUTPUT_DIR}',\n",
    "    '--transformer-impl local',\n",
    "    '--no-save-optim --no-save-rng',\n",
    "])\n",
    "\n",
    "cmd = f'{env} && {distributed} posttrain_gpt.py {model_args} {train_args} {data_args} {tune_args}'\n",
    "\n",
    "print(f'Training configuration: {nproc} NPU, TP={TP}, PP={PP}, DP={DP}')\n",
    "print(f'GBS={GBS}, MBS={MBS}, SEQ={SEQ_LENGTH}, ITERS={TRAIN_ITERS}')\n",
    "print(f'\\nStarting training...\\n')\n",
    "run_cmd(cmd, cwd=MINDSPEED_LLM_DIR)\n",
    "print(f'\\nTraining completed! Weights saved to: {OUTPUT_DIR}')"
   ],
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "d077bc56",
   "metadata": {},
   "source": [
    "## 7. Inference Validation\n",
    "\n",
    "Load the fine-tuned weights and run a generation test."
   ]
  },
  {
   "cell_type": "code",
   "id": "09ae43f0",
   "metadata": {},
   "source": "import os\n\nnproc = torch.npu.device_count()\n\nenv = ' && '.join([\n    f'cd {MINDSPEED_LLM_DIR}',\n    'export CUDA_DEVICE_MAX_CONNECTIONS=1',\n])\n\ndistributed = ' '.join([\n    'torchrun',\n    f'--nproc_per_node {nproc}',\n    '--nnodes 1 --node_rank 0',\n    '--master_addr localhost --master_port 6001',\n])\n\ninfer_args = ' '.join([\n    '--use-mcore-models',\n    '--spec mindspeed_llm.tasks.models.spec.qwen3_spec layer_spec',\n    '--qk-layernorm',\n    f'--tensor-model-parallel-size {TP}',\n    f'--pipeline-model-parallel-size {PP}',\n    '--num-layers 36 --hidden-size 4096 --num-attention-heads 32',\n    '--ffn-hidden-size 12288',\n    f'--max-position-embeddings {SEQ_LENGTH} --seq-length {SEQ_LENGTH}',\n    '--disable-bias-linear',\n    '--group-query-attention --num-query-groups 8',\n    '--swiglu --use-fused-swiglu',\n    '--normalization RMSNorm --norm-epsilon 1e-6 --use-fused-rmsnorm',\n    '--position-embedding-type rope --rotary-base 1000000 --use-fused-rotary-pos-emb',\n    '--make-vocab-size-divisible-by 1 --padded-vocab-size 151936',\n    '--micro-batch-size 1 --max-new-tokens 256',\n    '--tokenizer-type PretrainedFromHF',\n    f'--tokenizer-name-or-path {HF_MODEL_DIR}',\n    '--tokenizer-not-use-fast',\n    '--hidden-dropout 0 --attention-dropout 0',\n    '--untie-embeddings-and-output-weights',\n    '--no-gradient-accumulation-fusion --attention-softmax-in-fp32',\n    '--seed 42',\n    f'--load {OUTPUT_DIR}',\n    '--exit-on-missing-checkpoint --transformer-impl local',\n])\n\ncmd = f'{env} && {distributed} inference.py {infer_args}'\nfull_cmd = f'source {CANN_ENV} && source {ATB_ENV} && {cmd}'\n\nprint('Starting inference...\\n')\nrun_env = os.environ.copy()\nrun_env['PYTHONWARNINGS'] = _SUPPRESS_WARNINGS\nresult = subprocess.run(\n    ['bash', '-lc', full_cmd],\n    cwd=str(MINDSPEED_LLM_DIR),\n    text=True,\n    input='q\\n',   # Exit interactive chat mode automatically after inference.py finishes the default 4 generation rounds and enters input(); sending q terminates it\n    env=run_env,\n)\nif result.returncode != 0:\n    print(f'\\nInference return code: {result.returncode}')\nprint('\\nInference completed!')",
   "outputs": [],
   "execution_count": null
  },
  {
   "cell_type": "markdown",
   "id": "f87ecc9d",
   "metadata": {},
   "source": [
    "## Using a Real Dataset\n",
    "\n",
    "After verification succeeds, use the following steps for full fine-tuning with a real dataset:\n",
    "\n",
    "1. **Prepare the data**: place an Alpaca/ShareGPT/Pairwise dataset inside the container\n",
    "   - Alpaca: `{\"instruction\": \"...\", \"input\": \"...\", \"output\": \"...\"}`\n",
    "   - Change `HANDLER_NAME` to the matching handler\n",
    "\n",
    "2. **Tune the parameters**:\n",
    "   - `SEQ_LENGTH = 4096` to match the model context length\n",
    "   - `TRAIN_ITERS = 2000+` adjusted to the dataset size\n",
    "   - `GBS` adjusted to the NPU count and dataset size\n",
    "\n",
    "3. **Checkpoint interval**: change `--save-interval` in the training cell to save checkpoints periodically\n",
    "\n",
    "4. **enable-thinking**:\n",
    "   - `true` to process all data with slow-thinking mode\n",
    "   - `false` to process all data with fast-thinking mode\n",
    "   - `none` to mix fast and slow thinking (default)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.12",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}