是否能联通
psql -h 127.0.0.1 -U Shinelon-d mimic --password
pgadmin建库
psql -d mimiciv -f mimic-iv/buildmimic/postgres/create.sql
也可以pgadmin直接运行create.sql
导入csv.gz数据
psql -d mimic -v ON_ERROR_STOP=1 -v mimic_data_dir=H:/mimic-iv-2.2 -f C:\Users\Shinelon\Downloads\mimic-code-main\mimic-iv\buildmimic\postgres\load_gz.sql
医院表
表名 | 功能描述 |
---|---|
admissions | 患者入院信息(入院时间、出院时间、入院类型、科室等) |
d_hcpcs | HCPCS 编码字典(解释 HCPCS 代码对应的医疗服务 / 物品含义) |
d_icd_diagnoses | ICD-9 诊断编码字典(关联diagnoses_icd ,提供诊断名称、缩写等) |
d_icd_procedures | ICD-9 手术编码字典(关联procedures_icd ,解释手术编码含义) |
d_labitems | 实验室检测项字典(关联labevents ,解释itemid 对应的检测名称、单位等) |
diagnoses_icd | 患者诊断记录(存储 ICD-9 编码,关联subject_id 、hadm_id ) |
drgcodes | DRG 分组信息(患者的诊断相关分组,用于医保付费等) |
emar | 电子用药医嘱(用药医嘱概要,如药品、时间) |
emar_detail | 电子用药医嘱详情(剂量、频率、给药途径等) |
hcpcs_events | HCPCS 服务记录(患者接受的 HCPCS 编码对应的医疗服务) |
labevents | 实验室检测结果(患者的检验数据,关联d_labitems 的itemid ) |
microbiologyevents | 微生物检测结果(细菌培养、药敏试验等) |
omr | 门诊病历记录(Outpatient Medical Records,门诊相关数据) |
patients | 患者基本信息(性别、出生日期、唯一标识subject_id 等) |
pharmacy | 药房信息(药品库存、调配记录等) |
poe | 医嘱录入(Provider Order Entry,医嘱录入的基本信息) |
poe_detail | 医嘱详情(poe 的补充信息,如医嘱具体参数) |
prescriptions | 处方记录(患者的处方信息,含药品、剂量、开方时间等) |
procedures_icd | 手术记录(存储 ICD-9 手术编码,关联subject_id 、hadm_id ) |
provider | 医护人员信息(ID、姓名、科室等) |
services | 院内服务记录(患者所在服务单元,如 ICU、普通病房等) |
transfers | 转科记录(患者在院内的科室转移时间、去向等) |
卒中icd数据和提取
-
ICD-9 和 ICD-10 中 “卒中(stroke)” 的核心编码范围如下:
ICD-9-CM(MIMIC-III 常用)
卒中编码集中在 430–438(循环系统疾病类目),关键亚类包括:
- 430:蛛网膜下腔出血
- 431:脑内出血(非创伤性)
- 434:脑动脉闭塞(如脑血栓形成、脑栓塞)
- 436:急性难分类脑血管病
- 排除:435(短暂性脑缺血发作,TIA)通常不被视为卒中
ICD-10-CM(MIMIC-IV 常用)
卒中编码对应 I60–I69(脑血管疾患类目),关键亚类包括:
- I60:非创伤性蛛网膜下腔出血
- I61:非创伤性脑内出血
- I63:脑梗死(缺血性卒中)
- I64:未特指为出血或梗死的卒中
- I69:脑血管病后遗症(如卒中后偏瘫)
应用场景(如 MIMIC 数据库)
- 若使用 MIMIC-III(ICD-9),可通过
diagnoses_icd.icd_code and diagnoses_icd.icd_version=9 BETWEEN '430' AND '438'
筛选卒中病例; - 若使用 MIMIC-IV(ICD-10),则通过
diagnoses_icd10.icd_code BETWEEN 'I60' AND 'I69'
筛选。
提取所有ICD-9编码为434(缺血性卒中)的患者唯一ID(subject_id)
SELECT DISTINCT di.subject_id
FROM MIMICIV_HOSP.diagnoses_icd di – ICD-9诊断表
WHERE di.icd_code like ‘434%’ and di.icd_version=9 – 注意ICD编码是字符串类型,需加单引号
or di.icd_code like ‘163%’ and di.icd_version=10 – 注意ICD编码是字符串类型,需加单引号
提取AD的患者
G301 G309 G308 G300 3310
SELECT DISTINCT di.subject_id FROM MIMICIV_HOSP.diagnoses_icd di
WHERE (di.icd_version = 9 AND di.icd_code = ‘3310’) OR
(di.icd_version = 10 AND (di.icd_code IN (‘G301’, ‘G309’, ‘G308’, ‘G300’)) );
诊断排序
SELECT
icd_code,
count(icd_code) AS item_count – 每个itemid的出现次数
FROM MIMICIV_HOSP.diagnoses_icd
GROUP BY icd_code-- 按icd项目分组
ORDER BY desc item_count ; – 按itemid排序
匹配icd title
`import pandas as pd
# 读取诊断数据
all_diagnosis_df = pd.read_csv(r’H:\mimic-iv-2.2\all_diagnosis.csv’)
# 读取ICD字典数据
d_icd_df = pd.read_csv(r’H:\mimic-iv-2.2\hosp\csv\d_icd_diagnoses.csv’)
# 去除icd_code中的所有空格(包括中间和首尾的空格)
# 为原始数据添加处理后的icd_code列,用于匹配
all_diagnosis_df[‘icd_code_clean’] = all_diagnosis_df[‘icd_code’].astype(str).str.replace(r’\s+', ‘’, regex=True)
d_icd_df[‘icd_code_clean’] = d_icd_df[‘icd_code’].astype(str).str.replace(r’\s+', ‘’, regex=True)
# 提取ICD字典中处理后的所有icd_code并转为集合(提高查找效率)
icd_codes_in_dict = set(d_icd_df[‘icd_code_clean’].unique())
# 提取待匹配的所有处理后的icd_code(去重)
all_icd_codes = all_diagnosis_df[‘icd_code_clean’].unique()
# 同时保留原始icd_code用于显示
original_icd_mapping = dict(zip(all_diagnosis_df[‘icd_code_clean’], all_diagnosis_df[‘icd_code’]))
# 存储未找到匹配的icd_code(原始值和处理后的值)
not_found_codes = []
# 逐个匹配并检查
print(“开始匹配ICD代码(已自动去除空格)…”)
for clean_code in all_icd_codes:
original_code = original_icd_mapping[clean_code]
if clean_code not in icd_codes_in_dict:
not_found_codes.append({
‘原始代码’: original_code,
‘处理后代码’: clean_code
})
print(f"未找到匹配: 原始代码=‘{original_code}’,处理后代码=‘{clean_code}’")
# 打印总结信息
print(f"\n匹配完成。共检查 {len(all_icd_codes)} 个唯一ICD代码,其中 {len(not_found_codes)} 个未找到匹配。")
# 执行合并操作(使用处理后的代码进行匹配)
merged_df = pd.merge(
all_diagnosis_df,
d_icd_df[[‘icd_code_clean’, ‘icd_version’, ‘long_title’, ‘icd_code’]], # 保留原始icd_code
on=‘icd_code_clean’,
how=‘left’,
suffixes=(‘’, ‘_dict’) # 区分原始和字典中的icd_code
)
# 调整列顺序,移除临时的clean列
merged_df = merged_df.drop(columns=[‘icd_code_clean’])
# 可选:如果需要可以将字典中的原始icd_code重命名
# merged_df = merged_df.rename(columns={‘icd_code_dict’: ‘icd_code_from_dict’})
# 保存结果文件
csv_path = r’H:\mimic-iv-2.2\all_diagnosis_filled.csv’
merged_df.to_csv(csv_path, index=False)
print(f"\n合并结果已保存至: {csv_path}")
# 可选:将未找到的代码保存到文件
if not_found_codes:
not_found_df = pd.DataFrame(not_found_codes)
not_found_path = r’H:\mimic-iv-2.2\icd_codes_not_found.csv’
not_found_df.to_csv(not_found_path, index=False)
print(f"未找到的ICD代码已保存至: {not_found_path}")
`
前20的诊断
一般诊断数越多病例越多越好出文章
icd_code | item_count | icd_version | long_title |
---|---|---|---|
4019 | 97361 | 9 | Unspecified essential hypertension |
2724 | 63875 | 9 | Other and unspecified hyperlipidemia |
I10 | 51706 | 10 | Essential (primary) hypertension |
E785 | 48581 | 10 | Hyperlipidemia, unspecified |
53081 | 46238 | 9 | Esophageal reflux |
25000 | 41029 | 9 | Diabetes mellitus without mention of complication, type II or unspecified type, not stated as uncontrolled |
Z87891 | 38893 | 10 | Personal history of nicotine dependence |
42731 | 35180 | 9 | Atrial fibrillation |
311 | 34632 | 9 | Depressive disorder, not elsewhere classified |
4280 | 34597 | 9 | Congestive heart failure, unspecified |
41401 | 34237 | 9 | Coronary atherosclerosis of native coronary artery |
K219 | 33737 | 10 | Gastro-esophageal reflux disease without esophagitis |
V1582 | 30197 | 9 | Personal history of tobacco use |
F329 | 28787 | 10 | Major depressive disorder, single episode, unspecified |
5849 | 27641 | 9 | Acute kidney failure, unspecified |
2449 | 27065 | 9 | Unspecified acquired hypothyroidism |
I2510 | 25398 | 10 | Atherosclerotic heart disease of native coronary artery without angina pectoris |
3051 | 24542 | 9 | Tobacco use disorder |
2859 | 23288 | 9 | Anemia, unspecified |
实验室检查分析统计
统计每个itemid的记录数,并按itemid排列
SELECT
itemid,
count(itemid) AS item_count – 每个itemid的出现次数
FROM MIMICIV_HOSP.labevents
GROUP BY itemid – 按检测项目分组
ORDER BY itemid desc; – 按itemid排序(此时排序有效)
依旧是匹配之后出前20的实验室指标
itemid | item_count | label | fluid | category |
---|---|---|---|---|
51221 | 3325700 | Hematocrit | Blood | Hematology |
50912 | 3282278 | Creatinine | Blood | Chemistry |
51265 | 3216656 | Platelet Count | Blood | Hematology |
51006 | 3189474 | Urea Nitrogen | Blood | Chemistry |
51222 | 3188835 | Hemoglobin | Blood | Hematology |
51301 | 3171798 | White Blood Cells | Blood | Hematology |
51249 | 3167190 | MCHC | Blood | Hematology |
51279 | 3167075 | Red Blood Cells | Blood | Hematology |
51250 | 3167074 | MCV | Blood | Hematology |
51248 | 3167073 | MCH | Blood | Hematology |
51277 | 3166860 | RDW | Blood | Hematology |
50971 | 3150261 | Potassium | Blood | Chemistry |
50983 | 3119778 | Sodium | Blood | Chemistry |
50902 | 3083705 | Chloride | Blood | Chemistry |
50882 | 2972827 | Bicarbonate | Blood | Chemistry |
50868 | 2965452 | Anion Gap | Blood | Chemistry |
50931 | 2749562 | Glucose | Blood | Chemistry |
50893 | 2190816 | Calcium, Total | Blood | Chemistry |
50960 | 2154209 | Magnesium | Blood | Chemistry |
医嘱procedure分析
SELECT
itemid,
count(itemid) AS item_count – 每个itemid的出现次数
FROM MIMICIV_HOSP.procedures_icd
GROUP BY itemid – 按检测项目分组
ORDER BY itemid desc; – 按itemid排序(此时排序有效)
前20procedure
icd_code | item_count | icd_version | long_title | icd_code_dict |
---|---|---|---|---|
3893 | 13928 | 9 | Venous catheterization, not elsewhere classified | 3893 |
02HV33Z | 10061 | 10 | Insertion of Infusion Device into Superior Vena Cava, Percutaneous Approach | 02HV33Z |
8938 | 10041 | 9 | Other nonoperative respiratory measurements | 8938 |
3897 | 9843 | 9 | Central venous catheter placement with guidance | 3897 |
8856 | 9043 | 9 | Coronary arteriography using two catheters | 8856 |
966 | 7811 | 9 | Enteral infusion of concentrated nutritional substances | 966 |
3995 | 7374 | 9 | Hemodialysis | 3995 |
0040 | 7158 | 9 | Procedure on single vessel | 0040 |
9671 | 7036 | 9 | Continuous invasive mechanical ventilation for less than 96 consecutive hours | 9671 |
8952 | 6506 | 9 | Electrocardiogram | 8952 |
5491 | 6245 | 9 | Percutaneous abdominal drainage | 5491 |
9604 | 6203 | 9 | Insertion of endotracheal tube | 9604 |
3722 | 5788 | 9 | Left heart cardiac catheterization | 3722 |
8744 | 5482 | 9 | Routine chest x-ray, so described | 8744 |
4513 | 5470 | 9 | Other endoscopy of small intestine | 4513 |
0066 | 5100 | 9 | Percutaneous transluminal coronary angioplasty [PTCA] | 0066 |
9925 | 5006 | 9 | Injection or infusion of cancer chemotherapeutic substance | 9925 |
3E0G76Z | 4871 | 10 | Introduction of Nutritional Substance into Upper GI, Via Natural or Artificial Opening | 3E0G76Z |
3961 | 4833 | 9 | Extracorporeal circulation auxiliary to open heart surgery | 3961 |
3899 | 4712 | 9 | Other puncture of vein | 3899 |
HCPC分析
HCPC 全称为 Healthcare Common Procedure Coding System(医疗通用程序编码系统),是美国用于标准化描述医疗服务、医疗程序、医疗用品(如器械、药品)的编码体系,由美国医疗保险和医疗补助服务中心(CMS)与美国医学会(AMA)共同维护,核心作用是统一医疗服务的标识,支撑医保结算、医疗数据统计和服务费用核算。
核心组成与用途
- 主要编码类型
- Level I(CPT 编码):对应医生提供的诊疗服务(如手术、检查、门诊治疗),由 AMA 制定,编码为 5 位数字(如 99213 代表常规门诊随访)。
- Level II:对应医疗用品、设备、特殊服务(如轮椅、假肢、家庭医疗服务),编码以字母开头 + 4 位数字(如 E0100 代表标准轮椅)。
- 核心用途
- 医保报销:作为美国医保(如 Medicare/Medicaid)判断服务是否覆盖、计算报销金额的依据;
- 数据标准化:统一不同医疗机构的服务记录,便于医疗质量分析、疾病与治疗关联研究;
- 费用核算:明确医疗服务的具体编码,对应收费标准。
SELECT
hcpcs_cd,
count(hcpcs_cd) AS item_count – 每个itemid的出现次数
FROM MIMICIV_HOSP.hcpcsevents
GROUP BY hcpcs_cd – 按检测项目分组
ORDER BY hcpcs_cd desc; – 按itemid排序(此时排序有效)
前20hcpc
hcpcs_cd | item_count | code | short_description | long_description |
---|---|---|---|---|
G0378 | 53184 | G0378 | Hospital observation per hr | Hospital observation service, per hour |
99219 | 44401 | 99219 | Hospital observation services | |
99218 | 10222 | 99218 | Hospital observation services | |
99220 | 9775 | 99220 | Hospital observation services | |
43262 | 960 | 43262 | Digestive system | |
93454 | 923 | 93454 | Cardiovascular | |
44970 | 921 | 44970 | Digestive system | |
43264 | 839 | 43264 | Digestive system | |
43239 | 719 | 43239 | Digestive system | |
47562 | 717 | 47562 | Digestive system | |
C9600 | 627 | C9600 | Perc drug-el cor stent sing | Percutaneous transcatheter placement of drug eluting intracoronary stent(s), with coronary angioplasty when performed; single major coronary artery or branch |
43235 | 441 | 43235 | Digestive system | |
64447 | 398 | 64447 | Nervous system | |
45380 | 382 | 45380 | Digestive system | |
93458 | 373 | 93458 | Cardiovascular | |
64415 | 347 | 64415 | Nervous system | |
92980 | 328 | 92980 | Cardiovascular | |
43274 | 323 | 43274 | Digestive system | |
36246 | 311 | 36246 | Cardiovascular system |