{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "91e3c32146c38ad9",
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "# 🍡 模型融合\n",
    "\n",
    "融合多个强学习器，修正单一算法在偏差和方差的问题。\n",
    "\n",
    "均值法（Averaging）：\n",
    "* 将所有算法结果普通平均、加权平均。\n",
    "* 多个强学习器相互独立时，强学习器平均后的误差一定小于单一学习器\n",
    "* 加权和普通平均效果不相上下\n",
    "\n",
    "投票法（Voting）\n",
    "* 包含多数投票、相对多数投票、加权投票、软投票，仅适用分类\n",
    "* 绝对多数投票要求样本类别占比50%以上，否则输出空值\n",
    "* 相对多数投票少数服从多数\n",
    "* 软投票是根据强学习器概率和少数服从多数\n",
    "\n",
    "堆叠法（Stacking）\n",
    "* 建立一个元学习器与一个（或多个）个体学习器，将原始数据分为train和test。使用train训练个体学习器，使用个体学习器在train输出结果，作为元学习器的训练数据，最终由元学习器在test上输出结果\n",
    "* 如果只有一个个体学习器，会执行交叉验证得到多组输出结果作为元学习器的训练数据\n",
    "\n",
    "混合法（Blending）\n",
    "* 一种特殊的Stacking\n",
    "* 建立一个元学习器与一个（或多个）个体学习器，将原始数据分为train、val、test，使用train训练个体学习器，在val上输出结果，作为元学习器的训练数据，最终由元学习器在test上输出结果"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "8af9ae9a4298d1d3",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-11-14T18:20:04.696581900Z",
     "start_time": "2023-11-14T18:20:03.356286400Z"
    },
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "dict_keys(['data', 'target', 'frame', 'target_names', 'feature_names', 'DESCR'])\n"
     ]
    }
   ],
   "source": [
    "# 导入加利福尼亚房价数据\n",
    "from sklearn.datasets import fetch_california_housing\n",
    "from sklearn.model_selection import train_test_split as TTS\n",
    "\n",
    "data = fetch_california_housing()\n",
    "print(data.keys())\n",
    "\n",
    "# 划分数据集\n",
    "x = data['data']\n",
    "y = data['target']\n",
    "train_x, test_x, train_y, test_y = TTS(x, y, test_size=0.3, random_state=22)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "52b4ce264b0a829a",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-11-14T18:20:58.679709700Z",
     "start_time": "2023-11-14T18:20:57.756496400Z"
    },
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.ensemble import GradientBoostingRegressor\n",
    "from xgboost import XGBRegressor\n",
    "from lightgbm import LGBMRegressor\n",
    "\n",
    "from sklearn.model_selection import cross_val_score, KFold\n",
    "import time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "45d6549c7c13f034",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-11-14T18:21:07.821704100Z",
     "start_time": "2023-11-14T18:21:07.812181900Z"
    },
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "cv = KFold(n_splits=5, shuffle=True, random_state=22)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e4469c2b4c61eebf",
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "## 壹丨GBDT、LGBM、XGBoost训练"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "218aa488ae5b677c",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-11-14T18:22:31.855925200Z",
     "start_time": "2023-11-14T18:22:03.551167800Z"
    },
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "28.284186124801636\n",
      "Mean(RMSE) = 0.5435198338668081\n",
      "Var(RMSE) = 0.00023465103020537168\n"
     ]
    }
   ],
   "source": [
    "# 1. GBDT\n",
    "gbr = GradientBoostingRegressor(n_estimators=100, random_state=22)\n",
    "start = time.time()\n",
    "res = cross_val_score(gbr, train_x, train_y, cv=cv, scoring='neg_mean_squared_error')\n",
    "\n",
    "print(time.time() - start)\n",
    "tmp = (abs(res)**0.5).mean()\n",
    "print(f'Mean(RMSE) = {tmp}')\n",
    "tmp = (abs(res)**0.5).var()\n",
    "print(f'Var(RMSE) = {tmp}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "26bf572feacfecf7",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-11-14T18:22:37.828778900Z",
     "start_time": "2023-11-14T18:22:36.575974700Z"
    },
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[LightGBM] [Info] Total Bins 1837\n",
      "[LightGBM] [Info] Number of data points in the train set: 11558, number of used features: 8\n",
      "[LightGBM] [Info] Start training from score 2.070339\n",
      "[LightGBM] [Info] Total Bins 1838\n",
      "[LightGBM] [Info] Number of data points in the train set: 11558, number of used features: 8\n",
      "[LightGBM] [Info] Start training from score 2.067614\n",
      "[LightGBM] [Info] Total Bins 1837\n",
      "[LightGBM] [Info] Number of data points in the train set: 11558, number of used features: 8\n",
      "[LightGBM] [Info] Start training from score 2.067731\n",
      "[LightGBM] [Info] Total Bins 1838\n",
      "[LightGBM] [Info] Number of data points in the train set: 11559, number of used features: 8\n",
      "[LightGBM] [Info] Start training from score 2.072033\n",
      "[LightGBM] [Info] Total Bins 1838\n",
      "[LightGBM] [Info] Number of data points in the train set: 11559, number of used features: 8\n",
      "[LightGBM] [Info] Start training from score 2.075613\n",
      "1.1566441059112549\n",
      "Mean(RMSE) = 0.4774297398269434\n",
      "Var(RMSE) = 0.00024916379465450555\n"
     ]
    }
   ],
   "source": [
    "# 2. LGBM\n",
    "lgb = LGBMRegressor(n_estimators=100, force_col_wise=True,metric='rmse',random_state=22)\n",
    "start = time.time()\n",
    "res = cross_val_score(lgb, train_x, train_y, cv=cv, scoring='neg_mean_squared_error')\n",
    "\n",
    "print(time.time() - start)\n",
    "tmp = (abs(res)**0.5).mean()\n",
    "print(f'Mean(RMSE) = {tmp}')\n",
    "tmp = (abs(res)**0.5).var()\n",
    "print(f'Var(RMSE) = {tmp}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "b3bdc671e6d7548e",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-11-14T18:22:42.081089100Z",
     "start_time": "2023-11-14T18:22:41.051487700Z"
    },
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.8879978656768799\n",
      "Mean(RMSE) = 0.4830401686217102\n",
      "Var(RMSE) = 0.0002604404270682063\n"
     ]
    }
   ],
   "source": [
    "# 3. XGBoost\n",
    "xgb = XGBRegressor(n_estimators=100, random_state=22)\n",
    "start = time.time()\n",
    "res = cross_val_score(xgb, train_x, train_y, cv=cv, scoring='neg_mean_squared_error')\n",
    "\n",
    "print(time.time() - start)\n",
    "tmp = (abs(res)**0.5).mean()\n",
    "print(f'Mean(RMSE) = {tmp}')\n",
    "tmp = (abs(res)**0.5).var()\n",
    "print(f'Var(RMSE) = {tmp}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "200085430e4f8594",
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "结果汇总如下：\n",
    "\n",
    "| 算法 | RMSE   | 方差   | 用时    |\n",
    "| ---- | ------ | ------ | ------- |\n",
    "| GBDT | 0.5435 | 0.0002 | 21.5430 |\n",
    "| LGBM | 0.4774 | 0.0002 | 0.3145  |\n",
    "| XGB  | 0.4830 | 0.0003 | 0.61999 |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e283e4820d403cf",
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "## 贰丨模型融合"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "f05fc5f99b1cbdfb",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-11-14T18:24:06.394557800Z",
     "start_time": "2023-11-14T18:24:06.376767Z"
    },
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.ensemble import VotingRegressor\n",
    "\n",
    "# 以元祖的列表方式构建estimators\n",
    "gbr = GradientBoostingRegressor(n_estimators=100,random_state=22)\n",
    "lgb = LGBMRegressor(n_estimators=100,random_state=22,force_col_wise=True,metric='rmse')\n",
    "xgb = XGBRegressor(n_estimators=100,random_state=22)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "e377fc824963a1fa",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-11-14T18:24:43.631786500Z",
     "start_time": "2023-11-14T18:24:14.626677500Z"
    },
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[Voting] ...................... (1 of 3) Processing GBR, total=   5.6s\n",
      "[LightGBM] [Info] Total Bins 1837\n",
      "[LightGBM] [Info] Number of data points in the train set: 11558, number of used features: 8\n",
      "[LightGBM] [Info] Start training from score 2.070339\n",
      "[Voting] ..................... (2 of 3) Processing LGBM, total=   0.1s\n",
      "[Voting] ...................... (3 of 3) Processing XGB, total=   0.2s\n",
      "[Voting] ...................... (1 of 3) Processing GBR, total=   5.4s\n",
      "[LightGBM] [Info] Total Bins 1838\n",
      "[LightGBM] [Info] Number of data points in the train set: 11558, number of used features: 8\n",
      "[LightGBM] [Info] Start training from score 2.067614\n",
      "[Voting] ..................... (2 of 3) Processing LGBM, total=   0.2s\n",
      "[Voting] ...................... (3 of 3) Processing XGB, total=   0.2s\n",
      "[Voting] ...................... (1 of 3) Processing GBR, total=   5.4s\n",
      "[LightGBM] [Info] Total Bins 1837\n",
      "[LightGBM] [Info] Number of data points in the train set: 11558, number of used features: 8\n",
      "[LightGBM] [Info] Start training from score 2.067731\n",
      "[Voting] ..................... (2 of 3) Processing LGBM, total=   0.1s\n",
      "[Voting] ...................... (3 of 3) Processing XGB, total=   0.2s\n",
      "[Voting] ...................... (1 of 3) Processing GBR, total=   5.5s\n",
      "[LightGBM] [Info] Total Bins 1838\n",
      "[LightGBM] [Info] Number of data points in the train set: 11559, number of used features: 8\n",
      "[LightGBM] [Info] Start training from score 2.072033\n",
      "[Voting] ..................... (2 of 3) Processing LGBM, total=   0.1s\n",
      "[Voting] ...................... (3 of 3) Processing XGB, total=   0.2s\n",
      "[Voting] ...................... (1 of 3) Processing GBR, total=   5.4s\n",
      "[LightGBM] [Info] Total Bins 1838\n",
      "[LightGBM] [Info] Number of data points in the train set: 11559, number of used features: 8\n",
      "[LightGBM] [Info] Start training from score 2.075613\n",
      "[Voting] ..................... (2 of 3) Processing LGBM, total=   0.1s\n",
      "[Voting] ...................... (3 of 3) Processing XGB, total=   0.2s\n"
     ]
    }
   ],
   "source": [
    "# 使用投票法\n",
    "estimators = [('GBR',gbr),('LGBM',lgb),('XGB',xgb)]\n",
    "mix = VotingRegressor(estimators,verbose=True)\n",
    "cv_res = cross_val_score(mix,train_x,train_y,cv=cv,scoring='neg_mean_squared_error')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "a60372d52e6d2619",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-11-14T18:24:43.632787500Z",
     "start_time": "2023-11-14T18:24:43.610790Z"
    },
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Mean(RMSE) = 0.47908001792266736\n",
      "Var(RMSE) = 0.00023419211448020937\n"
     ]
    }
   ],
   "source": [
    "tmp = (abs(cv_res)**0.5).mean()\n",
    "print(f'Mean(RMSE) = {tmp}')\n",
    "tmp = (abs(cv_res)**0.5).var()\n",
    "print(f'Var(RMSE) = {tmp}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c1126c21fd4d039",
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "融合模型的结果没有单独的LGBM效果好。平均法融合效果更好的前提是：\n",
    "\n",
    "* 评估器是精调之后的强学习器\n",
    "* 被融合的评估器在交叉验证上的分数差异不大\n",
    "* 评估器与评估器之间是相互独立的（调整随机性，使评估器之间的差异更大）"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1b94dd04d775a063",
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "## 叁丨模型调参后融合"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "58cef628a68631b9",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-11-14T18:25:39.022459300Z",
     "start_time": "2023-11-14T18:24:54.843270600Z"
    },
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "44.16127824783325\n",
      "Mean(RMSE) = 0.5111\n",
      "Var(RMSE) = 0.0004\n"
     ]
    }
   ],
   "source": [
    "gbr = GradientBoostingRegressor(n_estimators=300, \n",
    "                                learning_rate=0.5,\n",
    "                                max_features=0.6,\n",
    "                                random_state=22)\n",
    "start = time.time()\n",
    "res = cross_val_score(gbr, train_x, train_y, cv=cv, scoring='neg_mean_squared_error')\n",
    "\n",
    "print(time.time() - start)\n",
    "tmp = (abs(res)**0.5).mean()\n",
    "print(f'Mean(RMSE) = {tmp:.4f}')\n",
    "tmp = (abs(res)**0.5).var()\n",
    "print(f'Var(RMSE) = {tmp:.4f}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "915ecf6f08def9e6",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-11-14T18:25:40.352773100Z",
     "start_time": "2023-11-14T18:25:39.011461600Z"
    },
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[LightGBM] [Info] Total Bins 1837\n",
      "[LightGBM] [Info] Number of data points in the train set: 11558, number of used features: 8\n",
      "[LightGBM] [Info] Start training from score 2.070339\n",
      "[LightGBM] [Info] Total Bins 1838\n",
      "[LightGBM] [Info] Number of data points in the train set: 11558, number of used features: 8\n",
      "[LightGBM] [Info] Start training from score 2.067614\n",
      "[LightGBM] [Info] Total Bins 1837\n",
      "[LightGBM] [Info] Number of data points in the train set: 11558, number of used features: 8\n",
      "[LightGBM] [Info] Start training from score 2.067731\n",
      "[LightGBM] [Info] Total Bins 1838\n",
      "[LightGBM] [Info] Number of data points in the train set: 11559, number of used features: 8\n",
      "[LightGBM] [Info] Start training from score 2.072033\n",
      "[LightGBM] [Info] Total Bins 1838\n",
      "[LightGBM] [Info] Number of data points in the train set: 11559, number of used features: 8\n",
      "[LightGBM] [Info] Start training from score 2.075613\n",
      "1.304314136505127\n",
      "Mean(RMSE) = 0.5187\n",
      "Var(RMSE) = 0.0002\n"
     ]
    }
   ],
   "source": [
    "lgb = LGBMRegressor(n_estimators=200, \n",
    "                    learning_rate=0.5,\n",
    "                    force_col_wise=True,\n",
    "                    colsample_bytree=0.6,\n",
    "                    metric='rmse',\n",
    "                    random_state=22)\n",
    "start = time.time()\n",
    "res = cross_val_score(lgb, train_x, train_y, cv=cv, scoring='neg_mean_squared_error')\n",
    "\n",
    "print(time.time() - start)\n",
    "tmp = (abs(res)**0.5).mean()\n",
    "print(f'Mean(RMSE) = {tmp:.4f}')\n",
    "tmp = (abs(res)**0.5).var()\n",
    "print(f'Var(RMSE) = {tmp:.4f}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "152c2dfdb780d8da",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-11-14T18:25:41.334657800Z",
     "start_time": "2023-11-14T18:25:40.321773200Z"
    },
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.8009979724884033\n",
      "Mean(RMSE) = 0.5025\n",
      "Var(RMSE) = 0.0003\n"
     ]
    }
   ],
   "source": [
    "xgb = XGBRegressor(n_estimators=100,\n",
    "                   learning_rate=0.5,\n",
    "                   colsample_bytree=0.6,\n",
    "                   random_state=22)\n",
    "start = time.time()\n",
    "res = cross_val_score(xgb, train_x, train_y, cv=cv, scoring='neg_mean_squared_error')\n",
    "\n",
    "print(time.time() - start)\n",
    "tmp = (abs(res)**0.5).mean()\n",
    "print(f'Mean(RMSE) = {tmp:.4f}')\n",
    "tmp = (abs(res)**0.5).var()\n",
    "print(f'Var(RMSE) = {tmp:.4f}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "f7752300c8b1da26",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-11-14T18:26:24.965012Z",
     "start_time": "2023-11-14T18:25:41.242657400Z"
    },
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[Voting] ...................... (1 of 3) Processing GBR, total=   8.5s\n",
      "[LightGBM] [Info] Total Bins 1837\n",
      "[LightGBM] [Info] Number of data points in the train set: 11558, number of used features: 8\n",
      "[LightGBM] [Info] Start training from score 2.070339\n",
      "[Voting] ..................... (2 of 3) Processing LGBM, total=   0.2s\n",
      "[Voting] ...................... (3 of 3) Processing XGB, total=   0.1s\n",
      "[Voting] ...................... (1 of 3) Processing GBR, total=   8.3s\n",
      "[LightGBM] [Info] Total Bins 1838\n",
      "[LightGBM] [Info] Number of data points in the train set: 11558, number of used features: 8\n",
      "[LightGBM] [Info] Start training from score 2.067614\n",
      "[Voting] ..................... (2 of 3) Processing LGBM, total=   0.2s\n",
      "[Voting] ...................... (3 of 3) Processing XGB, total=   0.1s\n",
      "[Voting] ...................... (1 of 3) Processing GBR, total=   8.4s\n",
      "[LightGBM] [Info] Total Bins 1837\n",
      "[LightGBM] [Info] Number of data points in the train set: 11558, number of used features: 8\n",
      "[LightGBM] [Info] Start training from score 2.067731\n",
      "[Voting] ..................... (2 of 3) Processing LGBM, total=   0.2s\n",
      "[Voting] ...................... (3 of 3) Processing XGB, total=   0.1s\n",
      "[Voting] ...................... (1 of 3) Processing GBR, total=   8.4s\n",
      "[LightGBM] [Info] Total Bins 1838\n",
      "[LightGBM] [Info] Number of data points in the train set: 11559, number of used features: 8\n",
      "[LightGBM] [Info] Start training from score 2.072033\n",
      "[Voting] ..................... (2 of 3) Processing LGBM, total=   0.2s\n",
      "[Voting] ...................... (3 of 3) Processing XGB, total=   0.1s\n",
      "[Voting] ...................... (1 of 3) Processing GBR, total=   8.2s\n",
      "[LightGBM] [Info] Total Bins 1838\n",
      "[LightGBM] [Info] Number of data points in the train set: 11559, number of used features: 8\n",
      "[LightGBM] [Info] Start training from score 2.075613\n",
      "[Voting] ..................... (2 of 3) Processing LGBM, total=   0.2s\n",
      "[Voting] ...................... (3 of 3) Processing XGB, total=   0.1s\n"
     ]
    }
   ],
   "source": [
    "estimators = [('GBR',gbr),('LGBM',lgb),('XGB',xgb)]\n",
    "mix = VotingRegressor(estimators,verbose=True)\n",
    "cv_res = cross_val_score(mix,train_x,train_y,cv=cv,scoring='neg_mean_squared_error')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "25c3e6d350007180",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2023-11-14T18:26:24.966011Z",
     "start_time": "2023-11-14T18:26:24.957181300Z"
    },
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Mean(RMSE) = 0.4682\n",
      "Var(RMSE) = 0.0003\n"
     ]
    }
   ],
   "source": [
    "tmp = (abs(cv_res)**0.5).mean()\n",
    "print(f'Mean(RMSE) = {tmp:.4f}')\n",
    "tmp = (abs(cv_res)**0.5).var()\n",
    "print(f'Var(RMSE) = {tmp:.4f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc8c3b7ba8dcd904",
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "结果汇总如下：\n",
    "\n",
    "| 算法             | RMSE   | 方差   | 用时    |\n",
    "| ---------------- | ------ | ------ | ------- |\n",
    "| Baseline（LGBM） | 0.4774 | 0.0002 | 0.3145  |\n",
    "| GBDT             | 0.5111 | 0.0004 | 44.1612 |\n",
    "| LGBM             | 0.5187 | 0.0002 | 1.3043  |\n",
    "| XGB              | 0.5025 | 0.0003 | 0.8010  |\n",
    "| Averaging        | 0.4682 | 0.0003 | ——      |"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
