有这么一句话在业界广泛流传:数据和特征决定了机器学习的上限,而模型和算法只是逼近这个上限而已。由此可见,特征工程在机器学习中占有相当重要的地位。在实际应用当中,可以说特征工程是机器学习成功的关键。
特征工程是数据分析中最耗时间和精力的一部分工作,它不像算法和模型那样是确定的步骤,更多是工程上的经验和权衡。因此没有统一的方法。这里只是对一些常用的方法做一个总结。
特征工程包含了 Data PreProcessing(数据预处理)、Feature Extraction(特征提取)、Feature Selection(特征选择)和 Feature construction(特征构造)等子问题。
特征选择¶
现在我们已经有大量的特征可使用,有的特征携带的信息丰富,有的特征携带的信息有重叠,有的特征则属于无关特征,尽管在拟合一个模型之前很难说哪些特征是重要的,但如果所有特征不经筛选地全部作为训练特征,经常会出现维度灾难问题,甚至会降低模型的泛化性能(因为较无益的特征会淹没那些更重要的特征)。因此,我们需要进行特征筛选,排除无效/冗余的特征,把有用的特征挑选出来作为模型的训练数据。
特征选择方法有很多,一般分为三类:
- 过滤法(Filter)比较简单,它按照特征的发散性或者相关性指标对各个特征进行评分,设定评分阈值或者待选择阈值的个数,选择合适特征。
- 包装法(Wrapper)根据目标函数,通常是预测效果评分,每次选择部分特征,或者排除部分特征。
- 嵌入法(Embedded)则稍微复杂一点,它先使用选择的算法进行训练,得到各个特征的权重,根据权重从大到小来选择特征。
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
from pyspark.ml import Pipeline
from pyspark.ml import Estimator, Transformer
from pyspark.ml.feature import StringIndexer, VectorAssembler, OneHotEncoder
import pyspark.sql.functions as fn
import pyspark.ml.feature as ft
from pyspark.ml.evaluation import BinaryClassificationEvaluator, MulticlassClassificationEvaluator
from pyspark.ml.linalg import Vectors
from pyspark.sql import Row
from pyspark.sql import Observation
from pyspark.sql import Window
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder, TrainValidationSplit
from xgboost.spark import SparkXGBClassifier
import xgboost as xgb
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import time
import warnings
import gc
# Setting configuration.
warnings.filterwarnings('ignore')
SEED = 42
# Use 0.11.4-spark3.3 version for Spark3.3 and 1.0.2 version for Spark3.4
spark = SparkSession.builder \
.master("local[*]") \
.appName("XGBoost with PySpark") \
.config("spark.driver.memory", "10g") \
.config("spark.driver.cores", "2") \
.config("spark.executor.memory", "10g") \
.config("spark.executor.cores", "2") \
.enableHiveSupport() \
.getOrCreate()
sc = spark.sparkContext
sc.setLogLevel('ERROR')
24/06/03 21:40:26 WARN Utils: Your hostname, MacBook-Air resolves to a loopback address: 127.0.0.1; using 192.168.1.5 instead (on interface en0) 24/06/03 21:40:26 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 24/06/03 21:40:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
定义数据集评估函数
def timer(func):
import time
import functools
def strfdelta(tdelta, fmt):
hours, remainder = divmod(tdelta, 3600)
minutes, seconds = divmod(remainder, 60)
return fmt.format(hours, minutes, seconds)
@functools.wraps(func)
def wrapper(*args, **kwargs):
click = time.time()
print("Starting time\t", time.strftime("%H:%M:%S", time.localtime()))
result = func(*args, **kwargs)
delta = strfdelta(time.time() - click, "{:.0f} hours {:.0f} minutes {:.0f} seconds")
print(f"{func.__name__} cost {delta}")
return result
return wrapper
def progress(percent=0, width=50, desc="Processing"):
import math
tags = math.ceil(width * percent) * "#"
print(f"\r{desc}: [{tags:-<{width}}]{percent:.1%}", end="", flush=True)
def cross_val_score(df, estimator, evaluator, features, numFolds=3, seed=SEED):
df = df.withColumn('fold', (fn.rand(seed) * numFolds).cast('int'))
eval_result = []
# Initialize an empty dataframe to hold feature importances
feature_importances = pd.DataFrame(index=features)
for i in range(numFolds):
train = df.filter(df['fold'] == i)
valid = df.filter(df['fold'] != i)
model = estimator.fit(train)
train_pred = model.transform(train)
valid_pred = model.transform(valid)
train_score = evaluator.evaluate(train_pred)
valid_score = evaluator.evaluate(valid_pred)
metric = evaluator.getMetricName()
print(f"[{i}] train's {metric}: {train_score}, valid's {metric}: {valid_score}")
eval_result.append(valid_score)
fscore = model.get_feature_importances()
fscore = {name:fscore.get(f'f{k}', 0) for k,name in enumerate(features)}
feature_importances[f'cv_{i}'] = fscore
feature_importances['fscore'] = feature_importances.mean(axis=1)
return eval_result, feature_importances.sort_values('fscore', ascending=False)
@timer
def score_dataset(df, inputCols=None, featuresCol=None, labelCol='label', nfold=3):
assert inputCols is not None or featuresCol is not None
if featuresCol is None:
# Assemble the feature columns into a single vector column
featuresCol = "features"
assembler = VectorAssembler(
inputCols=inputCols,
outputCol=featuresCol
)
df = assembler.transform(df)
# Create an Estimator.
classifier = SparkXGBClassifier(
features_col=featuresCol,
label_col=labelCol,
eval_metric='auc',
scale_pos_weight=11,
learning_rate=0.015,
max_depth=8,
subsample=1.0,
colsample_bytree=0.35,
reg_alpha=65,
reg_lambda=15,
n_estimators=500,
verbosity=0
)
evaluator = BinaryClassificationEvaluator(labelCol=labelCol, metricName='areaUnderROC')
# Training with 3-fold CV:
scores, feature_importances = cross_val_score(
df=df,
estimator=classifier,
evaluator=evaluator,
features=inputCols,
numFolds=nfold
)
print(f"cv_agg's valid auc: {np.mean(scores):.4f} +/- {np.std(scores):.5f}")
return feature_importances
df = spark.sql("select * from home_credit_default_risk.created_data")
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
# Persists the data in the disk by specifying the storage level.
from pyspark.storagelevel import StorageLevel
_ = df.persist(StorageLevel.MEMORY_AND_DISK)
features = df.drop('SK_ID_CURR', 'label').columns
feature_importances = score_dataset(df, inputCols=features)
Starting time 21:40:31
Java HotSpot(TM) 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled. Java HotSpot(TM) 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize=
CodeCache: size=131072Kb used=37738Kb max_used=37738Kb free=93333Kb bounds [0x00000001027b8000, 0x0000000104cc8000, 0x000000010a7b8000] total_blobs=13952 nmethods=12882 adapters=979 compilation: disabled (not enough contiguous free space left)
2024-06-03 21:41:08,093 INFO XGBoost-PySpark: _fit Running xgboost-2.0.3 on 1 workers with booster params: {'objective': 'binary:logistic', 'colsample_bytree': 0.35, 'device': 'cpu', 'learning_rate': 0.015, 'max_depth': 8, 'reg_alpha': 65, 'reg_lambda': 15, 'scale_pos_weight': 11, 'subsample': 1.0, 'verbosity': 0, 'eval_metric': 'auc', 'nthread': 1} train_call_kwargs_params: {'verbose_eval': True, 'num_boost_round': 500} dmatrix_kwargs: {'nthread': 1, 'missing': nan} [21:41:24] task 0 got new rank 0 (0 + 1) / 1] 2024-06-03 21:42:36,308 INFO XGBoost-PySpark: _fit Finished xgboost training! INFO:XGBoost-PySpark:Do the inference on the CPUs (0 + 8) / 100] INFO:XGBoost-PySpark:Do the inference on the CPUs (1 + 8) / 100] 2024-06-03 21:42:39,601 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:39,624 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:42:39,673 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:39,673 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:39,693 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:39,706 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:39,721 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:41,322 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:41,350 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:41,358 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:42:41,403 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:41,445 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:41,507 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:41,653 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:43,205 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:43,223 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:43,248 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:43,253 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:43,262 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:43,274 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:42:43,399 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (26 + 8) / 100] 2024-06-03 21:42:44,994 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:45,044 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:45,062 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:45,082 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:45,930 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:45,957 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:45,969 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (34 + 8) / 100] 2024-06-03 21:42:47,186 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:47,198 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:47,239 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:47,240 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:47,262 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:47,384 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:47,486 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:48,631 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:48,644 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:48,659 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:48,691 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:48,699 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:42:48,759 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:48,803 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:49,949 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:49,951 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:49,960 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:50,031 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:50,077 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:50,124 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:50,180 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:42:51,337 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:51,352 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:51,353 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:51,371 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:51,491 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:51,498 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:51,536 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:42:52,762 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:52,788 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:52,799 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:52,810 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:52,842 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:52,858 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:52,869 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:42:54,219 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:54,219 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:54,220 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:54,225 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:54,233 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:54,256 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:54,274 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:42:55,653 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:55,653 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:55,656 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:55,667 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:55,678 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:55,708 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:55,715 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:42:57,008 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:57,020 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:57,054 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:57,061 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:57,063 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:57,064 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:42:57,128 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:57,768 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:42:57,770 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:05,859 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:05,861 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:05,862 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:05,861 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:05,863 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:05,900 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:43:05,905 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:07,335 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:07,337 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:43:07,359 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:07,372 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:07,381 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:07,389 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:07,402 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:08,829 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:43:08,867 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:08,879 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:08,891 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:08,892 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:08,906 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:08,920 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (24 + 8) / 100] 2024-06-03 21:43:10,390 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:10,396 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:10,414 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:10,425 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:10,429 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:10,431 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:10,447 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:11,905 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:43:11,907 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:11,909 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:11,926 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:11,935 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:11,967 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:11,971 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:13,355 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:43:13,389 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:13,402 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:13,407 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:13,406 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:13,487 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:13,571 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:14,811 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:14,811 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:14,817 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:14,819 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:43:14,925 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:14,956 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:15,325 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:16,267 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:16,293 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:43:16,310 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:16,339 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:16,445 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:16,474 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:16,816 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (64 + 8) / 100] 2024-06-03 21:43:17,737 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:17,759 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:17,770 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:17,775 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:17,887 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:18,003 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:18,331 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs> (72 + 8) / 100] 2024-06-03 21:43:19,270 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:19,286 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:19,297 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:19,326 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:19,530 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:19,609 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:19,881 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs=====> (80 + 8) / 100] 2024-06-03 21:43:20,565 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:20,588 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:20,640 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:20,645 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:20,812 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:21,026 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:21,396 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:22,035 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:43:22,098 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:22,127 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:22,171 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:22,187 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:22,421 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:22,864 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:23,198 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:43:23,202 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:43:23,223 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs
[0] train's areaUnderROC: 0.8790646375204176, valid's areaUnderROC: 0.7621647570277277
2024-06-03 21:43:33,289 INFO XGBoost-PySpark: _fit Running xgboost-2.0.3 on 1 workers with booster params: {'objective': 'binary:logistic', 'colsample_bytree': 0.35, 'device': 'cpu', 'learning_rate': 0.015, 'max_depth': 8, 'reg_alpha': 65, 'reg_lambda': 15, 'scale_pos_weight': 11, 'subsample': 1.0, 'verbosity': 0, 'eval_metric': 'auc', 'nthread': 1} train_call_kwargs_params: {'verbose_eval': True, 'num_boost_round': 500} dmatrix_kwargs: {'nthread': 1, 'missing': nan} [21:43:46] task 0 got new rank 0 (0 + 1) / 1] 2024-06-03 21:44:56,058 INFO XGBoost-PySpark: _fit Finished xgboost training! 2024-06-03 21:44:57,714 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:44:58,345 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:44:58,859 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:44:59,280 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:44:59,355 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:44:59,385 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:44:59,389 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:44:59,415 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:44:59,434 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:44:59,459 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:44:59,508 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:00,600 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:00,685 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:00,691 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:00,726 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:00,763 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:00,776 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:00,838 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:45:01,052 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:02,100 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:02,101 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:02,114 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:02,132 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:02,151 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:02,172 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:02,298 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:45:02,445 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:03,479 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:03,481 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:03,481 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:03,482 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:03,516 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:03,539 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:03,753 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:45:03,947 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:04,784 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:04,806 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:04,811 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:04,857 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:04,859 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:04,918 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:04,936 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:45:05,294 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:06,199 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:06,203 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:06,207 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:06,235 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:06,251 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:06,254 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:06,271 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:45:06,698 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:07,583 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:07,609 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:07,628 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:07,635 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:07,638 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:07,642 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:07,656 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:45:08,009 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:08,801 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:08,802 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:08,831 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:45:08,840 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:08,841 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:08,843 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:08,851 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:09,222 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:10,182 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:10,193 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:10,199 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:10,204 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:10,216 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:10,223 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:45:10,228 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:10,486 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:11,584 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:11,599 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:11,619 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:11,629 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:45:11,629 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:11,633 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:11,650 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:12,004 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:13,042 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:13,058 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:13,088 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:45:13,093 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:13,103 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:13,109 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:13,120 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:13,428 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:14,296 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:14,339 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:45:14,343 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:14,342 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:14,361 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:14,382 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:14,386 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:14,630 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:14,983 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:21,951 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:21,952 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:21,960 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:21,981 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:21,983 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:21,989 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:21,989 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:45:22,005 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:23,523 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:23,590 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:23,606 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:23,608 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:23,611 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:23,615 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:45:23,619 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:23,635 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:25,074 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:25,101 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:25,115 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:25,115 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:45:25,143 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:25,142 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:25,144 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:25,153 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:26,718 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:26,758 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:45:26,779 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:26,784 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:26,799 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:26,815 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:26,826 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:26,836 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:28,191 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:28,213 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:28,215 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:28,248 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:28,270 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:28,276 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:28,281 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:45:28,321 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:29,760 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:29,785 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:29,800 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:29,812 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:29,815 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:29,822 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:29,825 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:29,870 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:45:31,227 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:31,273 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:31,293 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:45:31,295 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:31,296 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:31,311 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:31,311 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:31,325 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:32,860 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:32,886 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:32,895 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:32,895 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:45:32,898 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:32,906 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:32,905 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:32,908 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:34,408 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:34,408 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:34,431 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:34,450 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:34,479 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:34,502 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:34,502 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:45:34,514 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:35,844 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:35,881 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:35,884 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:35,910 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:45:35,911 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:35,920 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:35,930 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:35,950 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:37,483 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:37,487 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:37,489 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:37,489 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:37,505 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:37,516 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:45:37,516 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:37,527 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:38,967 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:38,987 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:38,991 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:39,009 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:39,069 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:45:39,074 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:39,078 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:39,105 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:39,903 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:39,915 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:39,928 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:45:39,935 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs
[1] train's areaUnderROC: 0.8746030416668324, valid's areaUnderROC: 0.7576869026346968
2024-06-03 21:45:49,068 INFO XGBoost-PySpark: _fit Running xgboost-2.0.3 on 1 workers with booster params: {'objective': 'binary:logistic', 'colsample_bytree': 0.35, 'device': 'cpu', 'learning_rate': 0.015, 'max_depth': 8, 'reg_alpha': 65, 'reg_lambda': 15, 'scale_pos_weight': 11, 'subsample': 1.0, 'verbosity': 0, 'eval_metric': 'auc', 'nthread': 1} train_call_kwargs_params: {'verbose_eval': True, 'num_boost_round': 500} dmatrix_kwargs: {'nthread': 1, 'missing': nan} [21:46:01] task 0 got new rank 0 (0 + 1) / 1] 2024-06-03 21:47:14,685 INFO XGBoost-PySpark: _fit Finished xgboost training! 2024-06-03 21:47:16,322 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:16,929 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:17,447 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:17,913 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:17,941 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:17,987 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:18,005 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:18,014 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:18,018 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:18,093 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:18,132 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:19,045 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:19,300 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:19,350 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:19,388 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:19,392 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:19,401 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:19,448 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:19,509 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:20,645 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:20,679 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:20,740 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:20,758 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:20,764 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:20,818 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:20,854 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:20,879 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:21,879 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:21,902 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:21,936 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:22,066 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:22,100 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:22,260 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:22,268 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:22,309 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:23,264 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:23,286 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:23,309 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:23,385 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:23,392 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:23,448 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:23,488 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:23,523 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:24,786 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:24,793 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:24,802 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:24,837 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:24,859 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:24,895 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:24,916 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:24,918 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:26,273 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:26,274 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:26,276 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:26,285 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:26,299 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:26,323 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:26,339 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:26,354 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:27,582 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:27,604 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:27,623 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:27,647 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:27,663 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:27,669 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:27,687 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:27,731 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:29,041 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:29,042 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:29,057 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:29,068 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:29,071 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:29,073 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:29,077 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:29,094 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:30,437 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:30,439 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:30,484 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:30,500 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:30,512 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:30,514 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:30,516 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:30,516 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:31,827 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:31,840 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:31,856 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:31,866 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:31,895 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:31,906 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:31,912 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:32,065 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:33,108 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:33,113 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:33,125 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:33,125 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:33,277 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:33,278 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:33,305 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:33,361 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:33,737 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:40,750 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:40,775 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:40,776 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:40,807 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:40,827 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:40,831 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:40,840 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:40,845 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:42,262 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:42,350 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:42,420 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:42,422 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:42,486 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:42,513 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:42,661 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:42,915 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:43,766 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:43,780 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:43,844 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:43,890 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:43,901 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:44,187 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:44,520 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:44,563 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:45,415 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:45,431 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:45,438 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:45,442 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:45,457 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:45,759 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:46,006 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:46,038 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:46,917 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:46,918 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:46,960 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:46,974 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:46,989 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:47,242 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:47,360 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:47,462 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:48,493 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:48,523 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:48,539 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:48,543 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:48,572 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:48,790 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:49,045 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:49,108 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:50,096 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:50,121 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:50,146 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:50,150 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:50,165 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:50,267 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:50,611 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:50,684 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:51,714 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:51,730 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:51,741 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:51,756 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:51,815 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:51,832 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:52,380 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:52,395 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:53,238 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:53,303 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:53,305 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:53,333 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:53,354 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:53,363 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:53,714 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:53,760 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:54,826 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:54,831 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:54,839 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:54,859 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:55,022 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:55,041 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:55,304 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:55,367 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:56,255 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:56,284 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:56,292 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:56,374 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:56,455 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:56,712 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:56,890 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:56,889 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:57,659 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:57,800 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:57,848 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:57,920 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:58,080 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:58,332 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:58,441 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:47:58,462 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:58,889 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:58,890 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:58,932 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:47:59,080 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs [Stage 84:======================================> (70 + 8) / 100]
[2] train's areaUnderROC: 0.8784984656392806, valid's areaUnderROC: 0.7583365874350807 cv_agg's valid auc: 0.7594 +/- 0.00198 score_dataset cost 0 hours 7 minutes 33 seconds
单变量特征选择¶
Relief(Relevant Features)是著名的过滤式特征选择方法。该方法假设特征子集的重要性是由子集中的每个特征所对应的相关统计量分量之和所决定的。所以只需要选择前k个大的相关统计量对应的特征,或者大于某个阈值的相关统计量对应的特征即可。
pyspark.ml.feature | |
---|---|
ChiSqSelector(numTopFeatures, ...) | 选择用于预测分类标签的分类特征 |
VarianceThresholdSelector(featuresCol, ...) | 删除所有低方差特征 |
UnivariateFeatureSelector(featuresCol, ...) | 单变量特征选择 |
UnivariateFeatureSelector
在具有分类/连续特征的分类/回归任务上选择特征。Spark根据指定的featureType
和labelType
参数选择要使用的评分函数。
featureType | labelType | score function |
---|---|---|
categorical | categorical | chi-squared (chi2) |
continuous | categorical | ANOVATest (f_classif) |
continuous | continuous | F-value (f_regression) |
它支持五种选择模式:
- numTopFeatures 选择评分最高的固定数量的特征。
- percentile 选择评分最高的固定百分比的特征。
- fpr选择p值低于阈值的所有特征,从而控制假阳性选择率。
- fdr使用Benjamini-Hochberg程序来选择错误发现率低于阈值的所有特征。
- fwe选择p值低于阈值的所有功能。阈值按1/numFeatures缩放,从而控制family-wise的错误率。
如何通俗地理解Family-wise error rate(FWER)和False discovery rate(FDR)
相关系数¶
皮尔森相关系数是一种最简单的方法,能帮助理解两个连续变量之间的线性相关性。
定义进度条
class DropCorrelatedFeatures(Estimator, Transformer):
def __init__(self, inputCols, threshold=0.9):
self.inputCols = inputCols
self.threshold = threshold
@timer
def _fit(self, df):
inputCols = [col for col,dtype in df.dtypes if dtype not in ['string', 'vector']]
to_keep = [inputCols[0]]
to_drop = []
for c1 in inputCols[1:]:
# The correlations
corr = df.select(*[fn.corr(c1, c2) for c2 in to_keep]).toPandas()
# Select columns with correlations above threshold
if np.any(corr.abs().gt(self.threshold)):
to_drop.append(c1)
else:
to_keep.append(c1)
self.to_drop = to_drop
self.to_keep = to_keep
return self
def _transform(self, df):
return df.drop(*self.to_drop)
# Drops features that are correlated
# model = DropCorrelatedFeatures(features, threshold=0.9).fit(df)
# correlated = model.to_drop
# print(f'Dropped {len(correlated)} correlated features.')
上述函数速度较慢,最终选择使用spark自带的相关系数矩阵:
from pyspark.ml.stat import Correlation
def drop_correlated_features(df, threshold=0.9):
inputCols = [col for col,dtype in df.dtypes if dtype not in ['string', 'vector']]
# Assemble the feature columns into a single vector column
assembler = VectorAssembler(
inputCols=inputCols,
outputCol="numericFeatures"
)
df = assembler.transform(df)
# Compute the correlation matrix with specified method using dataset.
corrmat = Correlation.corr(df, 'numericFeatures', 'pearson').collect()[0][0]
corrmat = pd.DataFrame(corrmat.toArray(), index=inputCols, columns=inputCols)
# Upper triangle of correlations
upper = corrmat.where(np.triu(np.ones(corrmat.shape), k=1).astype('bool'))
# Absolute value correlation
corr = upper.unstack().dropna().abs()
to_drop = corr[corr.gt(threshold)].reset_index()['level_1'].unique()
return to_drop.tolist()
correlated = drop_correlated_features(df.select(features))
selected_features = [col for col in features if col not in correlated]
print(f'Dropped {len(correlated)} correlated features.')
Dropped 127 correlated features.
卡方检验¶
卡方检验是一种用于衡量两个分类变量之间相关性的统计方法。
# Find categorical features
int_features = [k for k,v in df.select(selected_features).dtypes if v == 'int']
vector_features = [k for k,v in df.select(selected_features).dtypes if v == 'vector']
nunique = df.select([fn.countDistinct(var).alias(var) for var in int_features]).first().asDict()
categorical_cols = [f for f, n in nunique.items() if n <= 50]
continuous_cols = list(set(selected_features) - set(categorical_cols + vector_features))
from pyspark.ml.feature import UnivariateFeatureSelector
def chi2_test_selector(df, categoricalFeatures, outputCol):
selector = UnivariateFeatureSelector(
featuresCol="categoricalFeatures",
labelCol="label",
outputCol=outputCol,
selectionMode="fdr"
)
selector.setFeatureType("categorical").setLabelType("categorical").setSelectionThreshold(0.05)
# Assemble the feature columns into a single vector column
assembler = VectorAssembler(
inputCols=categoricalFeatures,
outputCol="categoricalFeatures"
)
df = assembler.transform(df)
model = selector.fit(df)
df = model.transform(df)
# n = df.first()["categoricalFeatures"].size
n = df.schema["categoricalFeatures"].metadata['ml_attr']['num_attrs']
print("The number of dropped features:", n - len(model.selectedFeatures))
return df
df_chi2_test = chi2_test_selector(df, categorical_cols + vector_features, 'selectedFeatures1')
The number of dropped features: 32
def anova_selector(df, continuousFeatures, outputCol):
selector = UnivariateFeatureSelector(
featuresCol="continuousFeatures",
labelCol="label",
outputCol=outputCol,
selectionMode="fdr"
)
selector.setFeatureType("continuous").setLabelType("categorical").setSelectionThreshold(0.05)
# Assemble the feature columns into a single vector column
assembler = VectorAssembler(
inputCols=continuousFeatures,
outputCol="continuousFeatures"
)
df = assembler.transform(df)
model = selector.fit(df)
df = model.transform(df)
print("The number of dropped features:", len(continuousFeatures) - len(model.selectedFeatures))
return df
df_anova = anova_selector(df_chi2_test, continuous_cols, 'selectedFeatures2')
The number of dropped features: 30
_ = score_dataset(df_anova, inputCols=["selectedFeatures1", "selectedFeatures2"], nfold=2)
Starting time 21:49:20
2024-06-03 21:49:21,660 INFO XGBoost-PySpark: _fit Running xgboost-2.0.3 on 1 workers with booster params: {'objective': 'binary:logistic', 'colsample_bytree': 0.35, 'device': 'cpu', 'learning_rate': 0.015, 'max_depth': 8, 'reg_alpha': 65, 'reg_lambda': 15, 'scale_pos_weight': 11, 'subsample': 1.0, 'verbosity': 0, 'eval_metric': 'auc', 'nthread': 1} train_call_kwargs_params: {'verbose_eval': True, 'num_boost_round': 500} dmatrix_kwargs: {'nthread': 1, 'missing': nan} [21:49:39] task 0 got new rank 0 (0 + 1) / 1] 2024-06-03 21:50:34,699 INFO XGBoost-PySpark: _fit Finished xgboost training! INFO:XGBoost-PySpark:Do the inference on the CPUs (0 + 8) / 100] INFO:XGBoost-PySpark:Do the inference on the CPUs (1 + 8) / 100] INFO:XGBoost-PySpark:Do the inference on the CPUs (2 + 8) / 100] 2024-06-03 21:50:38,545 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:38,742 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:38,820 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:38,850 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:38,888 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:38,909 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:38,931 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (10 + 8) / 100] 2024-06-03 21:50:40,896 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:40,900 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:40,902 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:40,903 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:40,920 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:41,093 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:41,096 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (18 + 8) / 100] 2024-06-03 21:50:43,651 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:43,795 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:43,830 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:43,902 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:43,917 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:44,057 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:44,062 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (26 + 8) / 100] 2024-06-03 21:50:45,331 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:45,441 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:45,452 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:45,762 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:45,875 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:45,928 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:45,938 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:46,982 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:47,009 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:47,015 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:50:47,342 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:47,362 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:47,378 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:47,605 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:48,748 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:48,766 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:50:48,775 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:49,123 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:49,135 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:49,143 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:49,166 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (50 + 8) / 100] 2024-06-03 21:50:50,359 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:50,370 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:50,396 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:50,711 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:50,790 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:50,807 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:50,838 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:51,997 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:51,998 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:50:52,018 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:52,416 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:52,420 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:52,455 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:52,457 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:53,630 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:53,633 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:53,648 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:50:54,026 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:54,116 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:54,127 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:54,131 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:55,078 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:55,107 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:50:55,136 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:55,573 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:55,594 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:55,600 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:55,643 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:56,497 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:56,517 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:56,559 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:50:57,028 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:57,039 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:57,074 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:57,115 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:57,992 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:50:58,010 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:58,026 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:58,536 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:58,557 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:58,559 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:58,570 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:50:58,954 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (0 + 8) / 100] 2024-06-03 21:51:06,229 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:06,229 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:06,234 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:06,252 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:06,265 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:06,269 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:06,271 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (8 + 8) / 100] 2024-06-03 21:51:08,053 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:08,061 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:08,064 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:08,074 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:08,074 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:08,077 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:08,096 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:09,691 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:09,691 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:09,701 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:51:09,717 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:09,733 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:09,747 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:09,750 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (24 + 8) / 100] 2024-06-03 21:51:11,163 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:11,216 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:11,223 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:11,270 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:11,276 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:11,275 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:11,283 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:12,800 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:51:12,854 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:12,880 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:12,897 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:12,953 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:12,991 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:12,990 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (40 + 8) / 100] 2024-06-03 21:51:14,446 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:14,447 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:14,483 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:14,492 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:14,492 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:14,661 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:14,769 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:16,109 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:16,116 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:51:16,121 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:16,142 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:16,145 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:16,507 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:16,545 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:17,798 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:17,805 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:17,808 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:17,861 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:51:17,865 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:17,935 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:18,117 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:19,830 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:19,831 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:19,835 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:19,836 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:19,836 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:51:19,869 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:20,098 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:21,358 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:21,379 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:21,379 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:51:21,422 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:21,440 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:21,440 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:21,814 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:23,138 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:23,156 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:51:23,176 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:23,220 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:23,229 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:23,237 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:23,654 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:24,636 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:24,674 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:51:24,720 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:24,723 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:24,763 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:24,891 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:25,556 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:26,020 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:51:26,038 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:51:26,046 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs
[0] train's areaUnderROC: 0.8526299026274513, valid's areaUnderROC: 0.7632345170337489
2024-06-03 21:51:32,718 INFO XGBoost-PySpark: _fit Running xgboost-2.0.3 on 1 workers with booster params: {'objective': 'binary:logistic', 'colsample_bytree': 0.35, 'device': 'cpu', 'learning_rate': 0.015, 'max_depth': 8, 'reg_alpha': 65, 'reg_lambda': 15, 'scale_pos_weight': 11, 'subsample': 1.0, 'verbosity': 0, 'eval_metric': 'auc', 'nthread': 1} train_call_kwargs_params: {'verbose_eval': True, 'num_boost_round': 500} dmatrix_kwargs: {'nthread': 1, 'missing': nan} [21:51:47] task 0 got new rank 0 (0 + 1) / 1] 2024-06-03 21:52:41,632 INFO XGBoost-PySpark: _fit Finished xgboost training! 2024-06-03 21:52:44,032 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:52:44,757 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:52:45,539 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:52:45,946 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:45,973 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:46,004 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:46,033 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:46,036 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:46,045 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:46,110 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:46,350 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:52:47,455 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:47,510 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:47,685 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:47,690 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:47,759 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:47,767 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:47,847 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:52:47,902 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:49,323 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:49,369 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:49,639 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:49,657 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:49,718 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:49,887 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:50,031 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:50,088 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:52:51,150 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:51,181 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:51,474 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:51,525 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:51,580 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:51,915 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:51,952 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:52,093 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:52:53,182 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:53,226 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:53,614 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:53,693 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:54,019 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:54,022 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:54,137 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:54,375 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:52:55,030 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:55,054 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:55,109 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:55,211 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:55,392 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:55,448 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:55,553 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:55,813 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:52:56,623 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:56,664 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:56,673 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:56,958 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:57,208 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:57,242 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:57,250 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:57,376 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:52:58,131 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:58,147 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:58,168 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:58,359 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:58,664 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:58,687 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:58,732 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:59,027 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:52:59,680 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:59,683 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:59,689 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:52:59,932 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:00,184 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:00,279 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:00,296 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:00,576 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:53:01,186 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:01,193 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:01,199 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:01,432 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:01,828 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:01,842 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:01,890 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:02,190 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:53:02,734 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:02,742 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:02,758 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:02,825 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:03,307 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:03,306 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:03,324 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:03,746 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:53:04,316 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:04,317 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:04,330 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:04,396 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:04,782 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:04,798 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:04,813 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:05,035 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:53:05,194 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:11,416 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:11,471 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:11,489 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:11,490 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:11,496 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:11,500 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:11,499 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:11,501 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:53:13,089 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:53:13,116 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:13,128 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:13,149 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:13,160 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:13,161 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:13,163 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:13,174 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:14,655 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:14,655 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:53:14,719 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:14,729 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:14,729 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:14,732 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:14,759 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:14,873 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:16,427 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:16,440 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:16,455 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:53:16,455 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:16,465 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:16,488 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:16,500 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:17,104 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:18,162 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:18,187 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:18,237 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:18,252 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:18,267 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:18,271 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:53:18,311 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:18,888 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:19,766 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:19,858 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:19,909 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:19,923 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:19,955 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:53:19,997 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:20,116 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:20,640 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:21,633 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:21,648 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:21,666 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:21,700 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:21,730 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:21,756 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:53:22,124 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:22,291 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:23,315 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:23,375 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:53:23,378 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:23,380 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:23,399 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:23,423 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:23,890 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:24,005 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:25,052 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:25,071 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:53:25,140 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:25,141 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:25,167 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:25,169 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:25,931 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:25,984 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:27,392 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:27,454 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:27,455 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:53:27,457 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:27,465 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:27,480 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:27,997 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:28,001 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:28,985 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:29,062 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:29,079 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:29,082 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:29,109 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:29,133 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:53:29,678 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:29,709 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:30,762 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:30,772 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:30,773 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:30,795 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:30,796 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:30,796 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:53:31,244 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:31,283 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:31,728 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:31,728 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 21:53:31,753 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 21:53:31,761 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs
[1] train's areaUnderROC: 0.8533149455907856, valid's areaUnderROC: 0.757047527015275 cv_agg's valid auc: 0.7601 +/- 0.00309 score_dataset cost 0 hours 4 minutes 16 seconds
del df_chi2_test, df_anova
gc.collect()
671
互信息¶
互信息是从信息熵的角度分析各个特征和目标之间的关系(包括线性和非线性关系)。
@timer
def calc_mi_scores(df, inputCols, labelCol):
mi_scores = pd.Series(name="MI Scores")
n = df.count()
y = labelCol
for x in inputCols:
grouped = df.groupBy(x, y).agg(fn.count("*").alias("Num_xy")).toPandas()
grouped["Num_x"] = grouped.groupby(x)["Num_xy"].transform("sum")
grouped["Num_y"] = grouped.groupby(y)["Num_xy"].transform("sum")
grouped["MI"] = grouped["Num_xy"] / n * np.log(grouped["Num_xy"] / grouped["Num_x"] * n / grouped["Num_y"])
grouped["MI"] = grouped["MI"].where(grouped["MI"] > 0, 0)
mi_scores[x] = grouped["MI"].sum()
mi_scores = mi_scores.sort_values(ascending=False)
return mi_scores
上述代码中采用了离散变量的互信息计算方法,在此我们先将连续变量离散化。
numBins = 50
buckets = {f"{col}_binned": col for col in continuous_cols}
bucketizer = ft.QuantileDiscretizer(
numBuckets=numBins,
handleInvalid='keep',
inputCols=continuous_cols,
outputCols=list(buckets)
).fit(df)
df = bucketizer.transform(df)
discrete_cols = categorical_cols + list(buckets)
class DropUninformative(Estimator, Transformer):
def __init__(self, inputCols, labelCol="label", threshold=0.0):
self.threshold = threshold
self.inputCols = inputCols
self.labelCol = labelCol
def _fit(self, df):
mi_scores = calc_mi_scores(df, self.inputCols, self.labelCol)
self.to_keep = mi_scores[mi_scores > self.threshold].index.tolist()
self.to_drop = list(set(self.inputCols) - set(self.to_keep))
return self
def _transform(self, df):
return df.drop(*self.to_drop)
model = DropUninformative(discrete_cols, "label", threshold=0.0).fit(df)
uninformative = [buckets.get(col, col) for col in model.to_drop]
print('The number of selected features:', len(model.to_keep))
print(f'Dropped {len(uninformative)} uninformative features.')
Starting time 21:54:53
[Stage 881:=============================================> (84 + 8) / 100]
calc_mi_scores cost 0 hours 7 minutes 10 seconds The number of selected features: 229 Dropped 5 uninformative features.
IV值¶
IV(Information Value)用来评价离散特征对二分类变量的预测能力。一般认为IV小于0.02的特征为无用特征。
@timer
def calc_iv_scores(df, inputCols, labelCol="label"):
assert df.select(labelCol).distinct().count() == 2, "y must be binary"
iv_scores = pd.Series()
# Compute information value
for var in inputCols:
grouped = df.groupBy(var).agg(
fn.sum(labelCol).alias('Positive'),
fn.count('*').alias('All')
).toPandas().set_index(var)
grouped['Negative'] = grouped['All']-grouped['Positive']
grouped['Positive rate'] = grouped['Positive']/grouped['Positive'].sum()
grouped['Negative rate'] = grouped['Negative']/grouped['Negative'].sum()
grouped['woe'] = np.log(grouped['Positive rate']/grouped['Negative rate'])
grouped['iv'] = (grouped['Positive rate']-grouped['Negative rate'])*grouped['woe']
iv_scores[var] = grouped['iv'].sum()
return iv_scores.sort_values(ascending=False)
iv_scores = calc_iv_scores(df, discrete_cols)
print(f"There are {iv_scores.le(0.02).sum()} features with iv <=0.02.")
Starting time 22:02:03
[Stage 1589:==============================================> (87 + 8) / 100]
calc_iv_scores cost 0 hours 6 minutes 38 seconds There are 98 features with iv <=0.02.
基尼系数¶
基尼系数用来衡量分类问题中特征对目标变量的影响程度。它的取值范围在0到1之间,值越大表示特征对目标变量的影响越大。常见的基尼系数阈值为0.02,如果基尼系数小于此阈值,则被认为是不重要的特征。
@timer
def calc_gini_scores(df, inputCols, labelCol="label"):
gini_scores = pd.Series()
# Compute gini score
for var in inputCols:
p = df.groupBy(var).agg(
fn.mean(labelCol).alias("mean")
).toPandas()
gini = 1 - p['mean'].pow(2).sum()
gini_scores[var] = gini
return gini_scores.sort_values(ascending=False)
gini_scores = calc_gini_scores(df, discrete_cols)
print(f"There are {gini_scores.le(0.02).sum()} features with gini <=0.02.")
Starting time 22:08:41
[Stage 2291:============================================> (84 + 8) / 100]
calc_gini_scores cost 0 hours 7 minutes 41 seconds There are 1 features with gini <=0.02.
VIF值¶
VIF用于衡量特征之间的共线性程度。通常,VIF小于5被认为不存在多重共线性问题,VIF大于10则存在明显的多重共线性问题。
def calc_vif_scores(df):
pass
# vif_scores = calc_vif_scores(df)
# print(f"There are {vif_scores.gt(10).sum()} collinear features (VIF above 10)")
小结¶
最终,我们选择删除高相关特征和无信息特征。
features_to_drop = list(set(uninformative) | set(correlated))
selected_features = [col for col in features if col not in features_to_drop]
print('The number of selected features:', len(selected_features))
print(f'Dropped {len(features_to_drop)} features.')
The number of selected features: 239 Dropped 132 features.
在371个总特征中只保留了239个,表明我们创建的许多特征是多余的。
递归消除特征¶
最常用的包装法是递归消除特征法(recursive feature elimination)。递归消除特征法使用一个机器学习模型来进行多轮训练,每轮训练后,消除最不重要的特征,再基于新的特征集进行下一轮训练。
由于RFE需要消耗大量的资源,这里就不编写函数运行了。
特征重要性¶
嵌入法也是用模型来选择特征,但是它和RFE的区别是它不通过不停的筛掉特征来进行训练,而是使用特征全集训练模型。
- 最常用的是使用带惩罚项($\ell_1,\ell_2$ 正则项)的基模型,来选择特征,例如 Lasso,Ridge。
- 或者简单的训练基模型,选择权重较高的特征。
我们先使用之前定义的 score_dataset
获取每个特征的重要性分数:
feature_importances = score_dataset(df, inputCols=selected_features, nfold=2)
Starting time 22:16:22
2024-06-03 22:16:24,426 INFO XGBoost-PySpark: _fit Running xgboost-2.0.3 on 1 workers with booster params: {'objective': 'binary:logistic', 'colsample_bytree': 0.35, 'device': 'cpu', 'learning_rate': 0.015, 'max_depth': 8, 'reg_alpha': 65, 'reg_lambda': 15, 'scale_pos_weight': 11, 'subsample': 1.0, 'verbosity': 0, 'eval_metric': 'auc', 'nthread': 1} train_call_kwargs_params: {'verbose_eval': True, 'num_boost_round': 500} dmatrix_kwargs: {'nthread': 1, 'missing': nan} [22:16:38] task 0 got new rank 0 (0 + 1) / 1] 2024-06-03 22:17:48,186 INFO XGBoost-PySpark: _fit Finished xgboost training! INFO:XGBoost-PySpark:Do the inference on the CPUs (0 + 8) / 100] INFO:XGBoost-PySpark:Do the inference on the CPUs (1 + 8) / 100] INFO:XGBoost-PySpark:Do the inference on the CPUs (2 + 8) / 100] 2024-06-03 22:17:52,679 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:52,881 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:52,886 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:52,913 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:52,941 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:52,958 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:52,977 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:17:53,633 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (11 + 8) / 100] 2024-06-03 22:17:54,498 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:54,588 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:54,698 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:54,705 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:54,738 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:54,740 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:55,171 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (20 + 8) / 100] 2024-06-03 22:17:55,951 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:56,035 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:56,043 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:56,121 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:56,123 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:56,161 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:56,408 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (28 + 8) / 100] 2024-06-03 22:17:57,654 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:57,667 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:57,680 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:57,732 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:57,903 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:57,915 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:57,928 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (32 + 8) / 100] 2024-06-03 22:17:58,784 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:58,807 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:58,879 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:59,032 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:59,059 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:59,155 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:17:59,159 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:18:00,491 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:00,577 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:00,590 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:00,691 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:00,693 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:00,722 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:00,734 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:18:01,790 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:01,824 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:01,923 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:02,143 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:02,150 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:02,159 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:02,173 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:18:03,109 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:03,180 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:03,258 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:03,683 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:03,734 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:03,830 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:03,879 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:18:04,651 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:04,699 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:04,774 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:05,091 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:05,245 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:05,275 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:05,325 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:18:05,960 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:05,962 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:06,015 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:06,645 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:06,675 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:06,695 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:06,825 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:18:07,347 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:07,463 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:07,495 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:08,103 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:08,130 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:08,200 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:08,264 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:18:08,711 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:08,793 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:08,809 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:09,255 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:09,264 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:09,330 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:18:19,452 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:18:19,472 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:19,526 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:19,533 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:19,535 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:19,545 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:19,551 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (8 + 8) / 100] 2024-06-03 22:18:20,590 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:20,640 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:20,704 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:20,720 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:20,732 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:20,751 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:20,769 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (16 + 8) / 100] 2024-06-03 22:18:22,217 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:22,222 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:22,273 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:22,273 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:22,290 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:22,334 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:22,338 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (24 + 8) / 100] 2024-06-03 22:18:23,489 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:23,556 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:23,584 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:23,621 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:23,668 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:23,700 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:23,774 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (32 + 8) / 100] 2024-06-03 22:18:24,779 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:24,779 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:24,786 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:24,786 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:25,143 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:25,211 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:25,254 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (38 + 8) / 100] 2024-06-03 22:18:25,997 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:26,030 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:26,077 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:26,215 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:26,387 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:26,554 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:26,558 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (46 + 8) / 100] 2024-06-03 22:18:27,229 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:27,267 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:27,338 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:27,609 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:27,633 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:28,037 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:28,152 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:18:28,864 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:28,892 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:29,018 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:29,221 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:29,244 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:29,540 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:18:29,761 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:30,260 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:30,268 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:30,436 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:30,629 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:30,673 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:30,743 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:18:31,314 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:31,634 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:31,681 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:32,010 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:32,156 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:32,156 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:18:32,226 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:32,620 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:32,775 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:32,862 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:33,178 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:33,405 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:33,475 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:18:33,561 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:34,188 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:34,212 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:34,297 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:34,634 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:34,686 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:34,847 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:18:35,089 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:35,450 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:35,450 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:35,467 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:18:35,666 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs
[0] train's areaUnderROC: 0.8545613660810463, valid's areaUnderROC: 0.7633448519087491
2024-06-03 22:18:43,478 INFO XGBoost-PySpark: _fit Running xgboost-2.0.3 on 1 workers with booster params: {'objective': 'binary:logistic', 'colsample_bytree': 0.35, 'device': 'cpu', 'learning_rate': 0.015, 'max_depth': 8, 'reg_alpha': 65, 'reg_lambda': 15, 'scale_pos_weight': 11, 'subsample': 1.0, 'verbosity': 0, 'eval_metric': 'auc', 'nthread': 1} train_call_kwargs_params: {'verbose_eval': True, 'num_boost_round': 500} dmatrix_kwargs: {'nthread': 1, 'missing': nan} [22:18:54] task 0 got new rank 0 (0 + 1) / 1] 2024-06-03 22:19:58,540 INFO XGBoost-PySpark: _fit Finished xgboost training! 2024-06-03 22:20:00,417 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:00,893 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:01,342 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:01,723 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:02,212 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:02,336 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:02,560 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:02,640 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:02,653 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:02,659 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:02,683 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:02,793 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:02,814 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:03,141 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:03,972 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:03,994 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:04,025 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:04,065 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:04,082 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:04,260 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:04,613 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:04,656 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:05,136 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:05,154 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:05,261 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:05,472 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:05,490 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:05,590 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:05,816 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:05,844 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:06,524 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:06,525 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:06,544 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:06,754 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:06,943 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:06,955 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:07,225 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:07,265 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:07,679 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:07,698 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:07,710 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:07,842 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:08,236 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:08,243 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:08,413 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:08,431 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:08,900 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:08,915 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:09,201 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:09,239 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:09,607 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:09,622 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:09,824 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:09,869 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:10,053 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:10,057 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:10,332 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:10,377 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:10,803 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:10,807 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:10,989 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:11,036 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:11,161 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:11,232 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:11,428 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:11,537 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:12,178 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:12,192 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:12,391 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:12,424 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:12,637 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:12,663 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:12,842 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:12,850 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:13,302 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:13,320 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:13,553 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:13,583 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:13,737 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:13,819 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:14,006 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:14,205 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:15,106 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:15,136 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:15,301 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:15,378 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:15,442 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:15,492 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:15,710 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:15,805 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:16,286 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:16,381 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:16,576 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:16,594 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:16,619 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:16,632 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:16,938 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:16,963 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:17,211 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:17,254 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:24,789 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:24,792 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:24,808 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:24,816 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:24,818 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:24,823 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:24,838 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:24,839 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:25,873 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:25,941 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:25,942 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:25,943 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:25,944 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:25,944 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:25,945 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:25,976 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:27,032 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:27,046 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:27,077 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:27,104 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:27,120 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:27,136 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:27,158 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:27,583 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:28,346 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:28,368 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:28,369 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:28,373 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:28,379 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:28,387 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:28,403 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:28,691 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:29,497 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:29,507 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:29,517 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:29,667 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:29,676 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:29,706 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:29,746 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:29,905 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:30,871 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:30,895 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:30,911 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:31,272 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:31,299 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:31,299 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:31,317 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:31,323 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:32,124 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:32,130 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:32,189 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:32,533 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:32,534 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:32,534 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:32,540 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:32,564 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:33,593 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:33,602 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:33,620 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:33,828 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:33,884 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:33,899 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:33,899 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:33,928 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:34,803 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:34,805 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:34,830 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:35,056 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:35,182 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:35,203 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:35,205 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:35,280 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:36,261 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:36,266 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:36,276 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:36,381 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:36,497 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:36,561 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:36,573 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:36,637 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:37,443 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:37,452 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:37,461 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:37,466 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:37,681 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:37,687 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:37,791 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:37,808 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:38,852 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:38,852 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:38,885 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:38,894 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:39,065 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:39,184 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:39,211 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:39,226 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:39,939 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:39,959 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:20:39,967 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:20:39,975 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs [Stage 2349:===================================> (67 + 9) / 100]
[1] train's areaUnderROC: 0.8553078656308732, valid's areaUnderROC: 0.7570120756115536 cv_agg's valid auc: 0.7602 +/- 0.00317 score_dataset cost 0 hours 4 minutes 24 seconds
# Sort features according to importance
feature_importances = feature_importances.sort_values('fscore', ascending=False)
feature_importances['fscore'].head(15)
AMT_GOODS_PRICE/AMT_ANNUITY 1448.0 DEF_60_CNT_SOCIAL_CIRCLE 1061.0 AMT_GOODS_PRICE/AMT_CREDIT 924.0 ln(EXT_SOURCE_2) 867.0 ln(EXT_SOURCE_3) 793.5 ORGANIZATION_TYPE/DAYS_BIRTH 777.5 DAYS_BIRTH/EXT_SOURCE_1 776.5 EXT_SOURCE_3/ORGANIZATION_TYPE 723.0 EXT_SOURCE_3/DAYS_BIRTH 687.5 ORGANIZATION_TYPE/EXT_SOURCE_1 685.5 centroid_0 662.5 EXT_SOURCE_2/ORGANIZATION_TYPE 658.5 EXT_SOURCE_2/DAYS_BIRTH 635.0 AMT_ANNUITY/AMT_INCOME_TOTAL 622.5 EXT_SOURCE_1/DAYS_BIRTH 587.0 Name: fscore, dtype: float64
可以看到,我们构建的许多特征进入了前15名,这应该让我们有信心,我们所有的辛勤工作都是值得的!
接下来,我们删除重要性为0的特征,因为这些特征实际上从未用于在任何决策树中拆分节点。因此,删除这些特征是一个非常安全的选择(至少对这个特定模型来说)。
# Find the features with zero importance
zero_importance = feature_importances.query("fscore == 0.0").index.tolist()
print(f'\nThere are {len(zero_importance)} features with 0.0 importance')
There are 7 features with 0.0 importance
selected_features = [col for col in selected_features if col not in zero_importance]
print("The number of selected features:", len(selected_features))
print("Dropped {} features with zero importance.".format(len(zero_importance)))
The number of selected features: 232 Dropped 7 features with zero importance.
删除0重要性的特征后,我们还有232个特征。如果我们认为此时特征量依然非常大,我们可以继续删除重要性最小的特征。
下图显示了累积重要性与特征数量:
feature_importances = feature_importances.sort_values('fscore', ascending=False)
sns.lineplot(x=range(1, feature_importances.shape[0]+1), y=feature_importances['fscore'].cumsum())
plt.show()
如果我们选择是只保留95%的重要性所需的特征:
def select_import_features(scores, thresh=0.95):
feature_imp = pd.DataFrame({'score': feature_importances['fscore']})
# Sort features according to importance
feature_imp = feature_imp.sort_values('score', ascending=False)
# Normalize the feature importances
feature_imp['score_normalized'] = feature_imp['score'] / feature_imp['score'].sum()
feature_imp['cumsum'] = feature_imp['score_normalized'].cumsum()
selected_features = feature_imp.query(f'cumsum <= {thresh}')
return selected_features.index.tolist()
import_features = select_import_features(feature_importances['fscore'], thresh=0.95)
print("The number of import features:", len(import_features))
print(f'Dropped {len(selected_features) - len(import_features)} features.')
The number of import features: 157 Dropped 75 features.
剩余157个特征足以覆盖95%的重要性。
feature_importances = score_dataset(df, inputCols=import_features)
Starting time 22:20:46
2024-06-03 22:20:47,988 INFO XGBoost-PySpark: _fit Running xgboost-2.0.3 on 1 workers with booster params: {'objective': 'binary:logistic', 'colsample_bytree': 0.35, 'device': 'cpu', 'learning_rate': 0.015, 'max_depth': 8, 'reg_alpha': 65, 'reg_lambda': 15, 'scale_pos_weight': 11, 'subsample': 1.0, 'verbosity': 0, 'eval_metric': 'auc', 'nthread': 1} train_call_kwargs_params: {'verbose_eval': True, 'num_boost_round': 500} dmatrix_kwargs: {'nthread': 1, 'missing': nan} [22:20:55] task 0 got new rank 0 (0 + 1) / 1] 2024-06-03 22:21:30,609 INFO XGBoost-PySpark: _fit Finished xgboost training! 2024-06-03 22:21:31,851 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:31,865 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:31,903 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:31,909 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:31,908 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:31,923 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:31,933 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:31,942 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:32,638 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:32,642 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:32,657 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:32,673 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:32,678 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:32,699 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:32,706 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:32,716 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:33,626 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:33,662 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:33,674 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:33,679 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:33,680 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:33,695 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:33,701 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:33,717 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:34,456 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:34,467 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:34,470 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:34,472 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:34,474 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:34,484 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:34,486 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:34,489 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:35,264 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:35,264 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:35,266 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:35,276 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:35,302 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:35,304 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:35,303 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:35,310 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:36,311 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:36,320 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:36,327 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:36,328 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:36,356 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:36,358 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:36,359 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:36,365 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:37,148 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:37,164 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:37,166 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:37,168 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:37,168 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:37,169 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:37,171 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:37,190 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:37,997 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:37,999 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:38,005 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:38,011 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:38,022 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:38,029 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:38,041 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:38,050 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:39,136 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:39,136 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:39,139 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:39,160 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:39,167 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:39,198 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:39,199 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:39,207 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:40,046 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:40,048 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:40,049 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:40,098 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:40,098 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:40,132 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:40,133 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:40,144 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:41,047 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:41,054 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:41,116 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:41,141 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:41,150 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:41,166 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:41,171 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:41,186 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:42,234 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:42,242 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:42,258 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:42,267 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:42,272 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:42,293 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:42,320 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:42,321 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:42,936 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:42,945 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:42,950 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:42,962 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:48,804 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:48,810 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:48,815 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:48,816 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:48,834 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:48,836 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:48,838 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:48,840 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:49,702 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:49,717 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:49,722 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:49,722 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:49,722 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:49,724 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:49,750 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:49,767 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:50,918 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:50,969 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:50,969 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:50,978 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:50,978 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:50,989 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:50,992 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:51,018 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:51,895 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:51,909 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:51,927 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:51,927 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:51,956 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:51,982 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:51,999 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:52,005 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:53,012 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:53,025 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:53,026 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:53,034 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:53,036 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:53,037 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:53,059 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:53,064 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:54,215 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:54,218 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:54,250 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:54,268 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:54,269 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:54,354 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:54,359 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:54,365 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:55,292 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:55,303 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:55,304 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:55,330 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:55,342 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:55,345 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:55,348 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:55,364 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:56,353 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:56,351 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:56,373 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:56,374 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:56,376 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:56,379 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:56,382 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:56,386 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:57,558 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:57,592 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:57,596 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:57,600 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:57,600 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:57,603 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:57,606 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:57,608 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:58,518 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:58,522 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:58,530 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:58,536 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:58,545 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:58,547 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:58,590 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:58,602 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:59,749 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:59,753 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:59,766 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:59,769 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:59,773 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:21:59,773 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:59,775 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:21:59,785 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:00,772 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:00,776 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:00,783 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:00,785 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:00,839 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:00,844 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:00,884 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:00,893 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:01,568 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:01,569 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:01,571 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:01,571 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs
[0] train's areaUnderROC: 0.8645996680043095, valid's areaUnderROC: 0.7537419087196509
2024-06-03 22:22:08,815 INFO XGBoost-PySpark: _fit Running xgboost-2.0.3 on 1 workers with booster params: {'objective': 'binary:logistic', 'colsample_bytree': 0.35, 'device': 'cpu', 'learning_rate': 0.015, 'max_depth': 8, 'reg_alpha': 65, 'reg_lambda': 15, 'scale_pos_weight': 11, 'subsample': 1.0, 'verbosity': 0, 'eval_metric': 'auc', 'nthread': 1} train_call_kwargs_params: {'verbose_eval': True, 'num_boost_round': 500} dmatrix_kwargs: {'nthread': 1, 'missing': nan} [22:22:16] task 0 got new rank 0 (0 + 1) / 1] 2024-06-03 22:22:50,959 INFO XGBoost-PySpark: _fit Finished xgboost training! 2024-06-03 22:22:52,268 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:52,277 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:52,278 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:52,280 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:52,293 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:52,301 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:52,302 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:52,313 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:53,225 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:53,230 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:53,245 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:53,251 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:53,253 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:53,252 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:53,255 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:53,257 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:53,978 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:53,978 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:53,986 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:53,995 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:54,005 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:54,006 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:54,007 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:54,009 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:54,775 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:54,777 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:54,782 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:54,787 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:54,800 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:54,800 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:54,801 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:54,801 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:55,766 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:55,766 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:55,774 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:55,775 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:55,775 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:55,775 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:55,792 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:55,796 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:56,487 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:56,511 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:56,511 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:56,516 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:56,526 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:56,538 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:56,547 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:56,557 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:57,276 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:57,289 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:57,292 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:57,308 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:57,316 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:57,319 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:57,357 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:57,377 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:58,258 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:58,278 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:58,286 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:58,308 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:58,322 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:58,330 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:58,332 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:58,352 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:59,067 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:59,067 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:59,077 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:59,081 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:59,083 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:59,089 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:59,091 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:59,121 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:59,874 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:59,879 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:59,887 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:59,888 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:59,890 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:59,891 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:22:59,893 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:22:59,893 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:00,922 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:00,926 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:00,938 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:00,939 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:00,940 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:00,942 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:00,941 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:00,943 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:01,723 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:01,726 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:01,735 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:01,745 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:01,745 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:01,748 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:01,750 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:01,756 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:02,307 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:02,309 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:02,309 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:02,315 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:07,636 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:07,668 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:07,670 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:07,685 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:07,691 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:07,697 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:07,724 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:07,725 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:08,245 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:08,340 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:08,408 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:09,032 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:09,126 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:09,154 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:09,168 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:09,169 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:09,381 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:09,427 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:09,467 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:09,817 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:09,983 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:09,991 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:10,091 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:10,173 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:10,549 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:10,603 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:10,643 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:10,913 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:10,961 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:10,990 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:11,214 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:11,282 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:11,404 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:11,504 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:11,516 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:12,285 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:12,286 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:12,294 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:12,479 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:12,567 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:12,594 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:12,655 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:12,664 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:13,225 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:13,230 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:13,250 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:13,418 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:13,519 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:13,542 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:13,600 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:13,609 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:14,205 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:14,207 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:14,211 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:14,678 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:14,718 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:14,719 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:14,778 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:14,799 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:15,225 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:15,246 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:15,264 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:15,524 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:15,665 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:15,684 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:15,735 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:15,747 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:16,092 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:16,135 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:16,143 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:16,558 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:16,705 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:16,724 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:16,731 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:16,742 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:16,980 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:16,998 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:17,046 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:17,746 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:17,819 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:17,824 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:17,853 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:17,895 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:18,046 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:18,109 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:18,131 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:18,698 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:18,722 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:18,746 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:18,783 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:18,794 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:19,062 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:19,143 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:19,144 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:19,559 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:19,561 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:19,562 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:23:19,563 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:19,592 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:23:19,856 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs
[1] train's areaUnderROC: 0.8617688316741262, valid's areaUnderROC: 0.7494887919280331
2024-06-03 22:23:27,496 INFO XGBoost-PySpark: _fit Running xgboost-2.0.3 on 1 workers with booster params: {'objective': 'binary:logistic', 'colsample_bytree': 0.35, 'device': 'cpu', 'learning_rate': 0.015, 'max_depth': 8, 'reg_alpha': 65, 'reg_lambda': 15, 'scale_pos_weight': 11, 'subsample': 1.0, 'verbosity': 0, 'eval_metric': 'auc', 'nthread': 1} train_call_kwargs_params: {'verbose_eval': True, 'num_boost_round': 500} dmatrix_kwargs: {'nthread': 1, 'missing': nan} [22:23:34] task 0 got new rank 0 (0 + 1) / 1] 2024-06-03 22:24:08,746 INFO XGBoost-PySpark: _fit Finished xgboost training! 2024-06-03 22:24:10,264 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:10,304 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:10,306 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:10,311 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:10,328 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:10,328 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:10,339 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:10,361 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:11,077 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:11,080 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:11,086 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:11,093 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:11,094 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:11,108 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:11,109 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:11,116 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:11,809 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:11,810 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:11,811 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:11,812 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:11,834 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:11,835 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:11,843 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:11,847 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:12,891 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:12,891 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:12,892 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:12,921 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:12,922 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:12,924 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:12,925 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:12,925 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:13,650 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:13,664 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:13,668 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:13,669 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:13,684 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:13,683 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:13,689 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:13,690 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:14,477 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:14,478 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:14,478 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:14,492 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:14,492 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:14,494 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:14,498 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:14,507 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:15,543 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:15,568 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:15,570 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:15,571 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:15,573 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:15,573 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:15,574 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:15,578 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:16,365 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:16,369 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:16,376 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:16,387 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:16,388 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:16,394 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:16,405 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:16,406 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:17,196 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:17,197 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:17,200 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:17,202 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:17,205 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:17,214 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:17,221 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:17,225 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:18,280 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:18,281 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:18,287 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:18,295 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:18,295 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:18,314 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:18,316 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:18,317 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:19,084 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:19,098 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:19,109 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:19,109 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:19,110 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:19,127 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:19,131 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:19,143 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:19,914 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:19,920 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:19,919 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:19,925 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:19,932 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:19,937 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:19,939 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:19,958 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:20,770 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:20,771 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:20,772 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:20,773 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:26,174 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:26,204 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:26,208 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:26,209 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:26,221 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:26,231 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:26,231 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:26,266 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:27,319 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:27,349 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:27,349 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:27,384 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:27,392 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:27,394 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:27,396 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:27,431 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:28,268 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:28,275 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:28,286 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:28,296 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:28,320 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:28,320 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:28,338 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:28,354 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:29,482 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:29,483 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:29,487 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:29,495 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:29,503 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:29,504 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:29,505 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:29,507 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:30,426 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:30,430 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:30,438 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:30,446 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:30,449 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:30,451 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:30,453 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:30,455 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:31,423 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:31,456 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:31,464 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:31,471 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:31,472 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:31,473 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:31,472 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:31,474 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:32,631 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:32,636 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:32,639 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:32,649 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:32,656 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:32,661 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:32,656 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:32,663 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:33,525 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:33,527 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:33,531 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:33,532 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:33,542 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:33,553 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:33,554 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:33,554 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:34,467 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:34,475 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:34,490 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:34,489 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:34,490 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:34,492 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:34,492 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:34,497 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:35,713 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:35,747 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:35,753 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:35,753 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:35,757 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:35,758 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:35,766 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:35,777 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:36,751 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:36,793 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:36,798 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:36,804 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:36,803 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:36,810 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:36,816 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:36,819 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:38,065 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:38,075 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:38,086 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:38,087 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:38,089 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:38,119 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:38,119 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:38,125 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:24:38,695 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:38,696 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:38,698 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:24:38,709 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs [Stage 2433:=========================> (48 + 8) / 100]
[2] train's areaUnderROC: 0.8602027702822611, valid's areaUnderROC: 0.7489103752356694 cv_agg's valid auc: 0.7507 +/- 0.00215 score_dataset cost 0 hours 3 minutes 58 seconds
在继续之前,我们应该记录我们采取的特征选择步骤,以备将来使用:
- 删除互信息为0的无效特征:删除了5个特征
- 删除相关系数大于0.9的共线变量:删除了127个特征
- 根据GBM删除0.0重要特征:删除7个特征
- (可选)仅保留95%特征重要性所需的特征:删除了75个特征
我们看下特征组成:
original_df = spark.sql("select * from home_credit_default_risk.prepared_data").limit(1).toPandas()
original_features = [f for f in selected_features if f in original_df.columns]
derived_features = [f for f in selected_features if f not in original_features]
print(f"Selected features: {len(original)} original features, {len(derived)} derived features.")
Selected features: 79 original features, 153 derived features.
保留的222个特征,有79个是原始特征,153个是衍生特征。
主成分分析¶
常见的降维方法除了基于L1惩罚项的模型以外,另外还有主成分分析法(PCA)和线性判别分析(LDA)。这两种方法的本质是相似的,本节主要介绍PCA。
pca = ft.PCA(
k=len(features),
inputCol="scaled",
outputCol="pcaFeatures"
)
# Assemble the feature columns into a single vector column
assembler = VectorAssembler(
inputCols=features,
outputCol="features"
)
scaler = ft.RobustScaler(
inputCol="features",
outputCol="scaled"
)
pipeline = Pipeline(stages=[assembler, scaler, pca]).fit(df)
pcaModel = pipeline.stages[2]
print("explained variance ratio:\n", pcaModel.explainedVariance[:5])
pca_df = pipeline.transform(df)
weight_matrix = pcaModel.pc
explained variance ratio: [9.47148918e-01 4.88162534e-02 3.38563499e-03 2.82225779e-04 7.15668020e-05]
其中 pcaModel.pc
对应 PCA 求解矩阵的SVD分解的截断矩阵 $V$,形状为 (n_features, n_components)
,其中 n_components
是我们指定的主成分数目,n_features
是原始数据的特征数目。pcaModel.pc
的每一列表示一个主成分,每一行表示原始数据的一个特征。因此,pca.components_
的每个元素表示对应特征在主成分中的权重。
可视化方差
def plot_variance(pca, n_components=10):
evr = pca.explainedVariance[:n_components]
grid = range(1, n_components + 1)
# Create figure
plt.figure(figsize=(6, 4))
# Percentage of variance explained for each components.
plt.bar(grid, evr, label='Explained Variance')
# Cumulative Variance
plt.plot(grid, np.cumsum(evr), "o-", label='Cumulative Variance', color='orange')
plt.xlabel("The number of Components")
plt.xticks(grid)
plt.title("Explained Variance Ratio")
plt.ylim(0.0, 1.1)
plt.legend(loc='best')
plot_variance(pcaModel)
plt.show()
PCA可以有效地减少维度的数量,但他们的本质是要将原始的样本映射到维度更低的样本空间中。这意味着PCA特征没有真正的业务含义。此外,PCA假设数据是正态分布的,这可能不是真实数据的有效假设。因此,我们只是展示了如何使用pca,实际上并没有将其应用于数据。
总结¶
本章介绍了很多特征选择方法
- 单变量特征选择可以用于理解数据、数据的结构、特点,也可以用于排除不相关特征,但是它不能发现冗余特征。
- 正则化的线性模型可用于特征理解和特征选择。但是它需要先把特征转换成正态分布。
- 嵌入法的特征重要性选择是一种非常流行的特征选择方法,它易于使用。但它有两个主要问题:
- 重要的特征有可能得分很低(关联特征问题)
- 这种方法对类别多的特征越有利(偏向问题)
至此,经典的特征工程至此已经完结了,我们继续使用XGBoost模型评估筛选后的特征。
feature_importances = score_dataset(df, selected_features, nfold=2)
Starting time 22:27:24
2024-06-03 22:27:26,078 INFO XGBoost-PySpark: _fit Running xgboost-2.0.3 on 1 workers with booster params: {'objective': 'binary:logistic', 'colsample_bytree': 0.35, 'device': 'cpu', 'learning_rate': 0.015, 'max_depth': 8, 'reg_alpha': 65, 'reg_lambda': 15, 'scale_pos_weight': 11, 'subsample': 1.0, 'verbosity': 0, 'eval_metric': 'auc', 'nthread': 1} train_call_kwargs_params: {'verbose_eval': True, 'num_boost_round': 500} dmatrix_kwargs: {'nthread': 1, 'missing': nan} [22:27:37] task 0 got new rank 0 (0 + 1) / 1] 2024-06-03 22:28:42,427 INFO XGBoost-PySpark: _fit Finished xgboost training! INFO:XGBoost-PySpark:Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (1 + 8) / 100] INFO:XGBoost-PySpark:Do the inference on the CPUs (2 + 8) / 100] INFO:XGBoost-PySpark:Do the inference on the CPUs (3 + 8) / 100] 2024-06-03 22:28:45,896 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:28:45,966 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:45,991 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:46,004 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:46,037 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:46,047 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:46,181 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (11 + 8) / 100] 2024-06-03 22:28:47,508 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:47,696 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:47,718 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:47,721 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:47,792 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:47,813 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:47,968 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (19 + 8) / 100] 2024-06-03 22:28:48,395 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:48,557 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:48,648 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:48,725 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:48,748 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:48,768 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:49,049 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:28:49,328 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:49,946 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:50,129 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:50,153 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:50,166 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:50,214 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:50,538 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (35 + 8) / 100] 2024-06-03 22:28:50,749 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:50,809 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:50,996 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:51,031 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:51,077 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:51,158 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:51,479 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:28:51,950 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:52,080 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:52,229 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:52,286 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:52,290 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:52,343 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (51 + 8) / 100] 2024-06-03 22:28:52,723 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:53,010 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:53,086 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:53,194 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:53,280 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:53,300 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:53,402 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (55 + 8) / 100] 2024-06-03 22:28:53,680 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:54,301 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:54,327 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:54,504 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:54,583 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:54,589 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:54,797 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:28:54,866 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:55,242 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:55,342 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:55,450 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:55,663 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:55,703 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:55,864 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:28:55,949 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:56,404 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:56,577 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:56,641 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:56,899 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:56,931 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:57,007 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:28:57,051 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:57,393 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:57,601 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:57,690 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:57,888 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:57,948 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:58,006 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:28:58,060 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:58,606 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:58,820 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:58,901 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:59,103 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:28:59,126 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:28:59,136 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:05,120 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:05,134 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:05,173 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:05,174 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:05,176 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:29:05,201 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:05,215 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:06,440 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:06,439 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:06,441 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:06,443 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:06,450 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:29:06,452 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:06,489 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs (16 + 8) / 100] 2024-06-03 22:29:07,498 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:07,515 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:07,534 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:07,533 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:07,537 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:07,552 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:07,554 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:09,066 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:09,068 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:09,120 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:29:09,155 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:09,225 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:09,243 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:09,254 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:10,038 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:10,092 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:29:10,231 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:10,240 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:10,389 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:10,435 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:10,461 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:11,320 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:11,351 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:29:11,469 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:11,547 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:11,674 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:11,788 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:11,812 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:12,249 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:12,306 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:29:12,494 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:12,627 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:12,663 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:12,794 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:12,812 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:13,481 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:29:13,534 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:13,650 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:13,985 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:13,985 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:14,187 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:14,279 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:14,511 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:14,563 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:29:14,615 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:14,976 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:15,026 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:15,143 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:15,259 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:15,887 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:29:15,900 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:16,027 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:16,355 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:16,463 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:16,659 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:16,687 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:16,932 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:29:17,045 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:17,281 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:17,404 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:17,546 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:17,767 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:17,780 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:18,374 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:29:18,404 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:18,647 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:18,723 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:18,816 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:19,046 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:19,058 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:19,274 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:29:19,337 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:29:19,479 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs
[0] train's areaUnderROC: 0.8514944460937176, valid's areaUnderROC: 0.7609365503074478
2024-06-03 22:29:25,656 INFO XGBoost-PySpark: _fit Running xgboost-2.0.3 on 1 workers with booster params: {'objective': 'binary:logistic', 'colsample_bytree': 0.35, 'device': 'cpu', 'learning_rate': 0.015, 'max_depth': 8, 'reg_alpha': 65, 'reg_lambda': 15, 'scale_pos_weight': 11, 'subsample': 1.0, 'verbosity': 0, 'eval_metric': 'auc', 'nthread': 1} train_call_kwargs_params: {'verbose_eval': True, 'num_boost_round': 500} dmatrix_kwargs: {'nthread': 1, 'missing': nan} [22:29:34] task 0 got new rank 0 (0 + 1) / 1] 2024-06-03 22:30:40,959 INFO XGBoost-PySpark: _fit Finished xgboost training! 2024-06-03 22:30:42,469 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:30:42,876 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:30:43,280 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:30:43,664 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:30:44,056 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:30:44,401 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:44,452 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:44,455 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:44,457 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:44,464 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:44,531 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:44,568 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:44,568 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:30:45,958 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:45,960 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:45,976 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:45,977 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:45,987 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:30:46,185 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:46,212 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:46,216 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:46,986 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:47,002 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:47,005 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:47,019 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:47,033 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:30:47,328 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:47,331 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:47,374 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:48,439 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:48,446 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:48,458 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:48,465 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:48,487 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:30:48,686 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:48,787 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:48,823 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:49,448 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:49,448 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:49,454 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:49,456 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:49,456 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:30:49,573 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:49,799 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:49,867 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:50,725 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:50,730 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:50,744 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:30:50,758 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:50,785 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:50,894 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:51,089 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:51,207 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:51,782 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:51,792 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:51,795 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:51,818 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:51,825 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:30:51,885 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:52,016 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:52,280 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:52,853 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:52,863 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:52,887 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:52,957 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:53,298 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:30:53,416 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:53,489 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:53,776 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:54,017 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:54,036 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:54,040 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:54,174 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:54,414 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:30:54,516 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:54,560 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:54,907 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:55,092 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:55,153 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:55,218 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:55,514 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:55,728 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:30:56,231 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:56,231 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:57,090 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:57,343 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:57,550 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:57,599 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:57,650 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:57,683 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:30:57,831 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:57,834 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:58,421 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:58,590 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:58,751 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:58,804 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:58,907 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:58,928 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:30:58,941 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:30:58,951 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:07,980 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:07,987 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:07,990 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:07,989 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:31:07,998 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:08,001 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:08,003 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:08,021 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:09,084 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:09,085 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:09,098 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:31:09,100 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:09,113 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:09,116 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:09,122 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:09,126 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:10,760 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:31:10,761 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:10,789 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:10,800 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:10,813 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:10,815 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:10,814 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:10,854 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:12,154 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:12,187 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:12,194 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:12,214 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:12,217 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:31:12,250 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:12,258 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:12,303 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:13,363 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:13,397 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:13,419 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:13,435 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:13,443 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:31:13,466 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:13,528 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:13,535 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:14,821 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:14,845 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:14,851 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:14,981 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:15,007 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:15,020 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:15,068 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:15,174 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:31:16,035 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:16,078 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:16,141 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:16,351 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:16,372 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:16,392 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:16,392 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:16,411 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:31:17,842 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:17,853 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:17,911 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:17,921 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:17,936 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:17,942 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:17,939 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:31:18,085 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:19,168 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:19,198 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:19,201 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:31:19,211 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:19,260 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:19,276 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:19,297 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:19,363 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:20,721 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:20,726 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:20,735 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:20,777 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:20,881 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:20,915 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:31:20,990 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:21,012 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:22,142 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:22,144 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:22,162 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:22,181 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:31:22,236 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:22,263 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:22,270 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:22,337 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:23,943 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:23,949 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:23,962 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:23,974 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:23,978 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:23,980 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:31:23,993 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:24,000 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:24,905 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:24,906 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs INFO:XGBoost-PySpark:Do the inference on the CPUs 2024-06-03 22:31:24,909 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs 2024-06-03 22:31:24,949 INFO XGBoost-PySpark: predict_udf Do the inference on the CPUs
[1] train's areaUnderROC: 0.8528487869720561, valid's areaUnderROC: 0.7552742165606054 cv_agg's valid auc: 0.7581 +/- 0.00283 score_dataset cost 0 hours 4 minutes 8 seconds
特征重要性:
# Sort features according to importance
feature_importances['fscore'].sort_values(ascending=False).head(15)
AMT_GOODS_PRICE/AMT_ANNUITY 1351.0 ln(EXT_SOURCE_2) 878.0 AMT_GOODS_PRICE/AMT_CREDIT 876.5 ORGANIZATION_TYPE/DAYS_BIRTH 840.0 EXT_SOURCE_3/DAYS_BIRTH 749.0 DAYS_BIRTH/EXT_SOURCE_1 724.5 centroid_0 689.0 ln(EXT_SOURCE_3) 681.0 EXT_SOURCE_2/DAYS_BIRTH 675.5 EXT_SOURCE_1/DAYS_BIRTH 670.0 EXT_SOURCE_3/ORGANIZATION_TYPE 659.5 AMT_REQ_CREDIT_BUREAU_QRT 638.0 EXT_SOURCE_2/ORGANIZATION_TYPE 635.5 AMT_ANNUITY/AMT_INCOME_TOTAL 629.5 ORGANIZATION_TYPE/EXT_SOURCE_1 593.5 Name: fscore, dtype: float64
保存数据集
selected_data = df.select('SK_ID_CURR', 'label', *selected_features)
selected_data.write.bucketBy(100, "SK_ID_CURR").mode("overwrite").saveAsTable("home_credit_default_risk.selected_data")
spark.stop()