在大数据风控业务实践过程中,目前业内主要还是采用规则叠加的办法做策略,但是会遇到一些问题:
1.我们有10条规则,我上了前7条后,后面3条的绝对风险增益是多少?
2.我的规则之间应该做排序吗,最重要的放在前面?
3.规则重合了怎么办,我又想收风险又想保留较高批核怎么办?
4.我应用完所有的规则后,金额逾期风险能下降到什么水平?
因此,我们按规则的重要性(lift高且抓坏规模大)将规则封装,再按照循环做逐个碰撞,计算由先到后每个规则的绝对增益。
本篇代码的优点是:就能充分确保后面的规则不因为前面的规则而丧失效率,批核能全部用在刀刃上,规则不重合,计算不同口径Y指标下的风险情况,风险目标更清晰准确的预算。
本文是代码交流,里面的策略阈值都为虚构。
定义规则并应用
rules = [ ('1_rhzx_nnm_over_acct>11', lambda row: row['rhzx_nnm_over_acct'] > 11), ('2_rhzx_nnm_over_cnt>=22', lambda row: row['rhzx_nnm_over_cnt'] >= 22), ('3_multi_final_level>=33', lambda row: row['multi_final_level'] >=33), # 可以添加更多规则...
]
定义一个计算金额逾期风险的函数
def cal_risk2(df2): total_fm7_amt = df2['fpd_fm7_amt'].sum() total_fz7_amt = df2['fpd_fz7_amt'].sum() total_mob4m2fenmu_amt = df2['mob4m2fenmu_amt'].sum() total_mob4m2fenzi_amt = df2['mob4m2fenzi_amt'].sum() return total_fz7_amt / total_fm7_amt if total_fm7_amt > 0 else 0,total_mob4m2fenzi_amt / total_mob4m2fenmu_amt if total_mob4m2fenmu_amt > 0 else 0
应用规则并计算边际风险压降效果
def cal_risk_lift(current_df): results2=[]results2 = [ { 'Rule': 'Baseline', 'Touched_Count': len(current_df), 'Due_FPD7_Amt': round(float(current_df['fpd_fm7_amt'].sum()),2), 'FPD_Fz7_Amt': current_df['fpd_fz7_amt'].sum(), '命中_$FPD7': current_df['fpd_fz7_amt'].sum()/ current_df['fpd_fm7_amt'].sum(), '命中_$MOB4M2': current_df['mob4m2fenzi_amt'].sum()/ current_df['mob4m2fenmu_amt'].sum(), '应用规则后_$FPD7': cal_risk2(current_df)[0],'应用规则后_$MOB4M2': cal_risk2(current_df)[1] } ] for i, (rule_name, rule_func) in enumerate(rules, start=1): touched_loans = current_df[current_df.apply(rule_func, axis=1)] touched_loan_ids = set(touched_loans['loan_id'].unique()) # 计算被规则触碰的贷款数据的到期金额和逾期金额 new_fm7_amount = round(touched_loans['fpd_fm7_amt'].sum() ,2) new_fz7_amount = touched_loans['fpd_fz7_amt'].sum() new_mob4m2fm_amt = touched_loans['mob4m2fenmu_amt'].sum() new_mob4m2fz_amt = touched_loans['mob4m2fenzi_amt'].sum() # 计算剩余贷款的风险情况(在应用当前规则之后,以借据为唯一主键) remaining_df = current_df[~current_df['loan_id'].isin(touched_loan_ids)] new_df_risk = cal_risk2(remaining_df) if not remaining_df.empty else 0 # 记录结果 results2.append({ 'Rule': rule_name, 'Touched_Count': len(touched_loan_ids), 'Due_FPD7_Amt': new_fm7_amount, 'FPD_Fz7_Amt': new_fz7_amount, '命中_$FPD7': new_fz7_amount/new_fm7_amount, '命中_$MOB4M2': new_mob4m2fz_amt/new_mob4m2fm_amt,### 展示纯新增命中# '命中_$MOB4M2': df['mob4m2fenzi_amt'].sum()/df['mob4m2fenmu_amt'].sum(),## 展示总命中'应用规则后_$FPD7': new_df_risk[0],'应用规则后_$MOB4M2': new_df_risk[1] }) # 更新current_df以反映应用规则后的效果current_df = remaining_df # 将结果转换为DataFrame results_df = pd.DataFrame(results2)return results_df
于是我们可以按分组group来统计规则的累计lift了
data6.groupby(['xj_sxdurseg']).apply(cal_risk_lift)