Scikit-learn(简称 sklearn)
是 Python 里最经典、最常用的传统机器学习库,专门做经典机器学习,不做深度学习。
一句话定位:
给传统机器学习用的 “一站式工具库”,简单、稳定、好用。
核心特点:
最典型用法:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
最常用、最完整、能直接跑的 scikit-learn 模板,包含:数据加载、划分、训练、评估、预测,复制就能用。
1. 安装
2. 完整示例:分类(以鸢尾花 + 逻辑回归为例)
# 导入库
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
# 1. 加载数据
data = load_iris()
X = data.data # 特征
y = data.target # 标签
# 2. 划分训练集/测试集
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# 3. 定义并训练模型
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
# 4. 预测
y_pred = model.predict(X_test)
# 5. 评估
print("准确率:", accuracy_score(y_test, y_pred))
print("\n分类报告:")
print(classification_report(y_test, y_pred))
3. 常用模型速查
# 分类
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
# 回归
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
# 聚类
from sklearn.cluster import KMeans
# 降维
from sklearn.decomposition import PCA
4. 常用工具
# 数据划分
train_test_split(...)
# 预处理
from sklearn.preprocessing import StandardScaler, MinMaxScaler
# 交叉验证
from sklearn.model_selection import cross_val_score
# 网格搜索调参
from sklearn.model_selection import GridSearch