{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "2d54445a-6458-435a-961d-dea0ca6b9ffb",
   "metadata": {},
   "source": [
    "# 🌳 LGBM\n",
    "\n",
    "LGBM（Light Gradient Boosting Machine），轻量梯度提升树，由Microsoft开发。\n",
    "\n",
    "官网：https://lightgbm.readthedocs.io/en/stable/\n",
    "\n",
    "安装：`pip install lightgbm`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "e3f5836c-a413-4ad9-ba8e-98912c122b6a",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "dict_keys(['data', 'target', 'frame', 'target_names', 'feature_names', 'DESCR'])\n"
     ]
    }
   ],
   "source": [
    "# 导入加利福尼亚房价数据\n",
    "from sklearn.datasets import fetch_california_housing\n",
    "from sklearn.model_selection import train_test_split as TTS\n",
    "\n",
    "data = fetch_california_housing()\n",
    "print(data.keys())\n",
    "\n",
    "# 划分数据集\n",
    "x = data['data']\n",
    "y = data['target']\n",
    "train_x, test_x, train_y, test_y = TTS(x, y, test_size=0.3, random_state=22)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "098ccd02-53ef-44ef-9a1d-734de536ac86",
   "metadata": {},
   "source": [
    "## 壹丨简单使用"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "6c6358f8-57ce-4134-a19f-fff95568b51e",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000553 seconds.\n",
      "You can set `force_col_wise=true` to remove the overhead.\n",
      "[LightGBM] [Info] Total Bins 1838\n",
      "[LightGBM] [Info] Number of data points in the train set: 14448, number of used features: 8\n",
      "[LightGBM] [Info] Start training from score 2.070666\n"
     ]
    }
   ],
   "source": [
    "import lightgbm as lgb\n",
    "\n",
    "# 需要将数据转换成LGBM的数据格式\n",
    "train_data = lgb.Dataset(train_x, train_y)\n",
    "\n",
    "param = {'seed': 22}\n",
    "reg = lgb.train(param, train_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20e916a6-de87-429f-b2b2-975d7c1dfce2",
   "metadata": {},
   "source": [
    "在LGBM中，使用直方图计算分枝方式，会按照行和列方向计算，没有设置对应参数时，会选择运行更快的分枝方式"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "8f19b3b9-0db2-4d62-a571-4ee53cb16d68",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[LightGBM] [Info] Total Bins 1838\n",
      "[LightGBM] [Info] Number of data points in the train set: 14448, number of used features: 8\n",
      "[LightGBM] [Info] Start training from score 2.070666\n"
     ]
    }
   ],
   "source": [
    "# 设置分支方式\n",
    "param = {'seed': 22, 'force_col_wise': True}\n",
    "reg = lgb.train(param, train_data, num_boost_round=10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "bd02a3ab-c677-43b0-ab7e-adaf77fd5937",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.6860233557436403"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 预测\n",
    "from sklearn.metrics import mean_squared_error as mse\n",
    "\n",
    "pred = reg.predict(test_x)\n",
    "mse(test_y, pred, squared=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "af2818ed-82d5-43aa-92fc-61eaf9ba65b2",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[LightGBM] [Info] Total Bins 1838\n",
      "[LightGBM] [Info] Number of data points in the train set: 11556, number of used features: 8\n",
      "[LightGBM] [Info] Total Bins 1838\n",
      "[LightGBM] [Info] Number of data points in the train set: 11556, number of used features: 8\n",
      "[LightGBM] [Info] Total Bins 1838\n",
      "[LightGBM] [Info] Number of data points in the train set: 11556, number of used features: 8\n",
      "[LightGBM] [Info] Total Bins 1838\n",
      "[LightGBM] [Info] Number of data points in the train set: 11556, number of used features: 8\n",
      "[LightGBM] [Info] Total Bins 1838\n",
      "[LightGBM] [Info] Number of data points in the train set: 11556, number of used features: 8\n",
      "[LightGBM] [Info] Start training from score 2.070080\n",
      "[LightGBM] [Info] Start training from score 2.067682\n",
      "[LightGBM] [Info] Start training from score 2.067000\n",
      "[LightGBM] [Info] Start training from score 2.071840\n",
      "[LightGBM] [Info] Start training from score 2.075814\n"
     ]
    }
   ],
   "source": [
    "# 交叉验证\n",
    "# stratified用于分类任务中均衡kfold中数据的分布\n",
    "param = {'seed': 22, 'metric': 'rmse', 'force_col_wise': True}\n",
    "result = lgb.cv(param, train_data, nfold=5, num_boost_round=10, seed=22, stratified=False)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
