专栏名称: 机器学习研究会
机器学习研究会是北京大学大数据与机器学习创新中心旗下的学生组织,旨在构建一个机器学习从事者交流的平台。除了及时分享领域资讯外,协会还会举办各种业界巨头/学术神牛讲座、学术大牛沙龙分享会、real data 创新竞赛等活动。
目录
相关文章推荐
51好读  ›  专栏  ›  机器学习研究会

[比赛记录] 主流机器学习模型模板代码+经验分享[xgb, lgb, Keras, LR]

机器学习研究会  · 公众号  · AI  · 2017-12-15 22:35

正文

请到「今天看啥」查看全文


import numpy as np

from scipy import sparse

from sklearn.preprocessing import OneHotEncoder

from sklearn.linear_model import LogisticRegression

from sklearn.preprocessing import StandardScaler


# 1. load data

df_train = pd.DataFrame()

df_test  = pd.DataFrame()

y_train = df_train['label'].values


# 2. process data

ss = StandardScaler()



# 3. feature engineering/encoding

# 3.1 For Labeled Feature

enc = OneHotEncoder()

feats = ["creativeID", "adID", "campaignID"]

for i, feat in enumerate(feats):

x_train = enc.fit_transform(df_train[feat].values.reshape(-1, 1))

x_test = enc.fit_transform(df_test[feat].values.reshape(-1, 1))

if i == 0:

X_train, X_test = x_train, x_test

else:

X_train, X_test = sparse.hstack((X_train, x_train)), sparse.hstack((X_test, x_test))


# 3.2 For Numerical Feature

# It must be a 2-D Data for StandardScalar, otherwise reshape(-1, len(feats)) is required

feats = ["price", "age"]







请到「今天看啥」查看全文