正文
salary
不限 19600
.000000
大专 10000
.000000
本科 19361
.344538
硕士 20642
.857143
df.groupby('education').mean()
题目
:将
createTime
列时间转换为
月-日
for index,row in df.iterrows():
df.iloc[index,0] = df.iloc[index,0].to_pydatetime().strftime("%m-%d")
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 135 entries, 0 to 134
Data columns (total 4 columns):
createTime 135 non-null object
education 135 non-null object
salary 135
non-null int64
categories 135 non-null category
dtypes: category(1), int64(1), object(2)
memory usage: 3.5+ KB
bins = [0,5000, 20000, 50000]
group_names = ['低', '中', '高']
df['categories'] = pd.cut(df['salary'], bins, labels=group_names)
df.sort_values('salary', ascending=False)
np.median(df['salary'])
# 17500.0
# Jupyter运行matplotlib成像需要运行魔术命令
%matplotlib inline
plt.rcParams['font.sans-serif'] = ['SimHei'] # 解决中文乱码
plt.rcParams['axes.unicode_minus'] = False # 解决符号问题
import matplotlib.pyplot as plt
plt.hist(df.salary)
# 也可以用原生pandas方法绘图
df.salary.plot(kind='hist')
df.salary.plot(kind='kde',xlim = (0,70000))
del df['categories']
# 等价于
df.drop(columns=['categories'
], inplace=True)
df['test'] = df['education'] + df['createTime']
题目
:
将education列与salary列合并为新的一列
备注:
salary为int类型,操作与35题有所不同
df["test1"] = df["salary"].map(str) + df['education']
df[['salary']].apply(lambda
x: x.max() - x.min())
# salary 41500
# dtype: int64
pd.concat([df[1:2], df[-1:]])
createTime object
education object
salary int64
test object
test1 object
dtype: object
df.dtypes
# createTime object
# education object
# salary int64
# test object
# test1 object
# dtype: object
df.set_index("createTime")
题目
:生成一个和df长度相同的随机数dataframe
df1 = pd.DataFrame(pd.Series(np.random.randint(1, 10, 135)))
题目
:将上一题生成的dataframe与df合并
df= pd.concat([df,df1],axis=1)
题目
:生成新的一列
new
为
salary
列减去之前生成随机数列
df["new"] = df["salary"] - df[0]
df.isnull().values.any()
# False
df['salary'].astype(np.float64)
len(df[df['salary'] > 10000])
# 119