【机器学习】支持向量机实验报告——基于SVM进行分类预测

目录

一、实验题目描述

二、实验步骤

三、Python代码实现基于SVM进行分类预测

四、我的收获

五、我的感受


一、实验题目描述

实验题目:基于SVM进行分类预测

实验要求:通过给定数据,使用支持向量机算法(SVM)实现分类预测,具体为:

筛选变量(如:行程距离、费用、时间等),进行数据预处理(如:处理缺失值、异常值、归一化/标准化数据),关于数据量过大的问题,可以从中筛选部分数据,但要求数据总量不可少于10w条,解释数据选取依据。使用SVM算法实现对芝加哥出租车出行支付方式(现金/信用卡)的分类预测(注:要求使用给定数据集,并且使用python进行数据处理)。训练和预测的数据比例为:80%:20%,给出明确的实验准确度验证过程。此外根据数据集中的其他变量进行进一步分析,探索不同因素对支付方式的影响强度,期待有新的发现,并完成报告。

实验报告内容:实验问题,实验目标,数据介绍(需要文字介绍,并辅助配合时间、空间、多因素分布等图表),实验方法(统一要求使用支持向量机算法(SVM),并且要将算法流程及公式、数据处理流程、实验验证流程、完整写入方法章节),实验结果(需要标注参数设置,预测准确度等),实验结果分析(支持有多图表的实验结果,分析不同因素对实验结果的重要性等,有趣的发现额外加分)。

实验数据集说明:

数据集:Chicago Taxi Trips Dataset (2023)-----包含芝加哥出租车出行记录

Trip ID:出行编号    Taxi ID:出租车编号   

Trip Start/End Timestamp 上车/下车时间  Trip Seconds:行程时长(秒) Trip Miles:行程距离(英里)

Pickup/ Dropoff Census Tract:上车/下车人口普查区编号:乘客上车位置所在的美国人口普查地块编号

Pickup/Dropoff Community Area:上车/下车社区区域编号:芝加哥市规划划分的77个社区区域

Fare:基础车费    Tips:小费金额   Tolls:路桥费   Extras:额外费用   Trip Total:总

费用

Payment Type:支付方式    Company:所属出租车公司

Pickup Centroid Latitude/Longitude:上车点纬度/经度

Pickup Centroid Location:上车位置坐标:上车地点的地理坐标(格式为纬度, 经度)

Dropoff Centroid Latitude/Longitude:下车点纬度/经度

Dropoff Centroid Location:下车位置坐标:下车地点的地理坐标(格式为纬度, 经度)

实验数据集可以参考https://download.csdn.net/download/2401_84149564/90962954?spm=1011.2124.3001.6210(由于数据量过大,这里只列出前108行数据,都可以放在"linear.csv"和"spiral.csv"文件中)。

Trip ID,Taxi ID,Trip Start Timestamp,Trip End Timestamp,Trip Seconds,Trip Miles,Pickup Census Tract,Dropoff Census Tract,Pickup Community Area,Dropoff Community Area,Fare,Tips,Tolls,Extras,Trip Total,Payment Type,Company,Pickup Centroid Latitude,Pickup Centroid Longitude,Pickup Centroid Location,Dropoff Centroid Latitude,Dropoff Centroid Longitude,Dropoff Centroid  Location
011106b6114f83af0c17aace3867a464a7fc742b,4628ef9dfa973bdfe877c5aa9d9738f9dc1204e54f2f1a4cc18141f37e2e66d080533f82510a96d1525b28eee833696f7e1337e9999a38f2fd5babf71585a344,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,982,0.81,17031081800,17031081500,8,8,8.75,3,0,1,13.25,Credit Card,Chicago Independents,41.89321636,-87.63784421,POINT (-87.6378442095 41.8932163595),41.89250778,-87.62621491,POINT (-87.6262149064 41.8925077809)
e9a66ddcc78cfd79f419165314cbe5ee380f16c3,8efe74ab61de459003dcedd85c637ce11bba19bac633cde9559a4895c98d7185ce0f7742dbd8b1938151fc3cdb89a3b4234bf80bf6368654831c79cf9685b3a9,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,390,0.09,,,,,34.86,0,0,0,34.86,Cash,Flash Cab,,,,,,
e765192268db3480b5d9bd0443f7ce7fd5ba047d,6f45c05aa231c9dd389ebdb65ca751cd82ef7634766017a1240d6554bf91840a924cf3cd16564a1ca643c9b1880db706d977ab5164f4b0bd030e0fff21cb3934,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1271,4.18,,,8,6,15.5,0,0,0,15.5,Cash,Flash Cab,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.9442266,-87.65599818,POINT (-87.6559981815 41.9442266014)
c6510d4f82541cfacf8c20cab44fbb7c0b2c5efe,89fc6b1f0628f328ccd1021fcf4e7318bb2f9962da9259b522bde63ca44f9f201a016291bbd2801fad04845cfd5b30b954afedccb22f6c49fadda05821804a06,12/31/2023 11:45:00 PM,01/01/2024 12:15:00 AM,1280,8.19,,,,,22.5,3,0,0,26,Credit Card,Flash Cab,,,,,,
f9445eed26da9a0eff247350df942616cb51e764,14275cab8bb64007379de40be92944817231f63047033992a1964ce85a4a5405085b87a804736c24ecd8a334baa315255c066e271d7f73dfb756a19a24a44d25,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,585,5.02,,,28,6,35,2,0,0,37.5,Credit Card,Sun Taxi,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.9442266,-87.65599818,POINT (-87.6559981815 41.9442266014)
f74fa03a6cf8cdcf668d0726efa1671d398b4450,2fea69c8a6e08471bc4339a05e9ee7955bef68d791f77a202bd54f3ae41c805907d7ac13a89f86fac4494c976ca87883157baa32ea41f59056661884135f6bba,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1473,9.94,,,56,,26.75,0,0,18,44.75,Cash,Sun Taxi,41.79259236,-87.76961545,POINT (-87.7696154528 41.7925923603),,,
f40c2cda1cea33c2265a34b2ce1eb454067ad8d2,3618045f9110d4d88482266ade23659c1a50d32ac37f205c15614b1ada9d4ca14b171329afed5dfd81c7525bd5a05fe614cb63b2aa48d920626b519e20d9e146,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,0,0,,,24,24,3.25,0,0,11.25,14.5,Credit Card,Taxi Affiliation Services,41.90120699,-87.67635599,POINT (-87.6763559892 41.9012069941),41.90120699,-87.67635599,POINT (-87.6763559892 41.9012069941)
f3a139c0df3513324ff3f699bf40db2e84291e3a,9b48ad5744e86450fb4db78e7095a6827bafc43a6a9d9a8f656aac46cc0e429d129471cdad31f8a5a97b3a45c8af5fcbc80d003c1c4839075733900786e1a5a9,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1228,4.29,,,6,21,15.25,0,0,0,15.25,Cash,Sun Taxi,41.9442266,-87.65599818,POINT (-87.6559981815 41.9442266014),41.9386662,-87.71121059,POINT (-87.7112105933 41.9386661962)
ee7cce18a4b24e080366930ab5ec72d1aeb6556c,adb1cb74113851b651b474182fbe95a9663783779db8cca0fdc3ff7ac82cc8fe5d864b7f86a995c64e405fa6c89cd87d00b458108ebe0c62bc23c4c79e61da46,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1310,18.23,,,28,76,60,0,0,0,60.5,Credit Card,5 Star Taxi,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146)
ed445ada05f17c5f359892eda3c329e1445b5e7b,4b034948aceedd53262ae713f864b0364953a1852b6b24669f192cad26c5014f1af0b6c87b941abb1fa93e1abbe09f70d7f02d48e5371d2c55534b68565a3060,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,4,0,17031320100,17031320100,32,32,20,5.12,0,0,25.62,Credit Card,Sun Taxi,41.88498719,-87.62099291,POINT (-87.6209929134 41.8849871918),41.88498719,-87.62099291,POINT (-87.6209929134 41.8849871918)
ec183abaa7ff142f17ebcdafa1f3d4e611a9f494,f6d1b6c930d62f6d8cbbd8f86a593ff057408c82f764744a7a38ee63957a74b84eaeea80224ea3a0021ba1572323a282530b0659b5e1ce48d04939eacb504060,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,786,2.02,17031081500,17031330100,8,33,10,0,0,0,10,Cash,Chicago Independents,41.89250778,-87.62621491,POINT (-87.6262149064 41.8925077809),41.85934972,-87.61735801,POINT (-87.6173580061 41.859349715)
e5c03bc6d864518431ce24706a4a9055221dc333,99ec13d5d806f5f5fa7a57910f8e38d84f90630529f2f8766d65b47caae8cb7cacf3d4bb9ca6576dfa49bc45b9f0f615e79577ace618514c9c59dc52ffbf40b6,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1064,9.58,,,76,12,25,0,0,5,30,Cash,5 Star Taxi,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),41.99393013,-87.75835359,POINT (-87.7583535876 41.9939301285)
e2b8bea5dbc60464ff88ba8dc8b66836513101e8,884655d853cbe41e1cdf747969f0dc5b55ed2d5f76c09ae207083297c948a813b2dd57912bfb4a6f4230556ab61363257e586827846cf2a89ff35c7c3b1c08e5,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,279,1.51,,,49,49,6.5,0,0,0,6.5,Cash,Flash Cab,41.70658788,-87.62336651,POINT (-87.6233665115 41.7065878819),41.70658788,-87.62336651,POINT (-87.6233665115 41.7065878819)
dcfbaa5d01e81e18637185fc5e822d6a08456f59,2659a61c08f91c6efd9e7d7947a00006a7bc26aa518241786d51cb05c853cdd86fdd1adc4010867706a2daa9f0da856cb2d7a705c111d3c89e53f5499741247e,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,300,1.8,,,76,7,7.5,4,0,1,13,Credit Card,Globe Taxi,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),41.92268628,-87.64948873,POINT (-87.6494887289 41.9226862843)
d51430f93404726b121d82d42efc29a2062895a8,24d4c5e51d147aecbb7c4a1ad70c38dbc05c7b4485f6de4b41fee5ea270228859c19c4160b101df87b889acc5cc7ab49b5e0491676026518677d34ea81f76591,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,379,1.25,,,14,13,6.75,3,0,0,10.25,Credit Card,Taxicab Insurance Agency Llc,41.968069,-87.72155906,POINT (-87.7215590627 41.968069),41.98363631,-87.72358319,POINT (-87.7235831853 41.9836363072)
ce61e0f21271970f7bf3006d489638d1c320a62f,f7782da531b08c6ce5a1e16a8c2998f6f4f7943f29ab53713949ec17f3b4d7f8b4cd8a84da2fe7c4b0a17f1fc3439e376d3bfec25d83652dfa4468342e25f6b1,12/31/2023 11:45:00 PM,01/01/2024 12:45:00 AM,3845,4.98,,,32,6,31,0,0,0,31,Cash,Taxicab Insurance Agency Llc,41.87886558,-87.62519214,POINT (-87.6251921424 41.8788655841),41.9442266,-87.65599818,POINT (-87.6559981815 41.9442266014)
cde7d22932829a7b19fb43bfd9a1d635c1e3f04e,52e8915b8a7b8851b341adb6797c3652a198b772561e9e9888d9963de61b796f60653ebdd06f44aa2fad304efb0a53160d6210b2a26fcc64b21932db0d658e32,12/31/2023 11:45:00 PM,01/01/2024 12:15:00 AM,1852,22.97,17031980000,,76,,55.5,10,0,35,101,Credit Card,Flash Cab,41.97907082,-87.90303966,POINT (-87.9030396611 41.9790708201),,,
cb50d6951086242beccb8fe7d248cfad3fab3dd7,179f1a051e9e6d3fc0726628962faff68506086ee8df14091e91399452fada055453e421df026fcc4449a43ba54a357c364482e62a20420f836389bd2269b5ee,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1383,17.95,,,28,,43.75,11.06,0,0,55.31,Credit Card,Blue Ribbon Taxi Association,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),,,
c6cb3aad561e0c407239333d535a4922540f9adc,bca79085da78d157007711d04c6e06f655ee5eafb1e5b654033c2f34fcea1d1fd230b48cff6ab67c598d1baa406d0689fd207404711267f2d56fdd93e3a0e6ec,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,839,7.17,,,76,10,19.75,5.05,0,5,30.3,Credit Card,Flash Cab,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),41.9850151,-87.80453201,POINT (-87.8045320063 41.9850151008)
c5ad9b572bff9d5cb9a8cf0da173789f7910a835,c7a8a53874bbcdb11e70a488485e8bdd0bb8cc0de8f5d98d3ef4d9c3223d7b4024a6d8fda9c00e8e55f3985e5a728ddce0175359c31e0bc5cd3c008d89b23ed1,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,7,0,,,28,28,30,0,0,0,30.5,Credit Card,Flash Cab,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383)
c23cbb41a952defb103a40ca767a32c387532614,32f1eebe57165cc17acba84eb8bb85d69a063ed0e5e15e108a68bc4548403834ab9c888ed3c3d474d10a0f95d498a5a9a0ef801331a61db60adec230c48e61b0,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1163,4.5,,,8,6,16.25,0,0,1,17.25,Cash,Chicago Independents,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.9442266,-87.65599818,POINT (-87.6559981815 41.9442266014)
b3f31e1c4e3673813abc423c9b2e415bdbd1b3a8,acad560bbc140c4015f4685c6559c93b61ccaf0f7d80143fa408d25169652c7b861a3be42e8fbcb91cac8da7e691489626884bc2bab1ae6015710cde1eab4e3f,12/31/2023 11:45:00 PM,01/01/2024 12:30:00 AM,2305,12.63,,,8,47,37,0,0,0,37,Cash,Taxicab Insurance Agency Llc,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.72818206,-87.5964756,POINT (-87.5964755956 41.728182061)
ae20c529b8423608f3f0bfcfa243100219f1241e,d3e38cf4471f5b65aa0c41a155252c395b7c8593ac6fb5741e0cac5f68831f4f418eebc0f19ea0acd5b398d6b376ec7cbb2dfa9b846af33a9d27146e2a009b4b,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,1,0.04,,,27,27,40,0,0,0,40,Cash,5 Star Taxi,41.8789145,-87.70589713,POINT (-87.7058971305 41.8789144956),41.8789145,-87.70589713,POINT (-87.7058971305 41.8789144956)
ac9c9cd082dbdbfb67fc062dcb74ed713820e47f,75cf3a53aae5e5858361a7ca64f75d3407dc0a44d7bc42843fd566a614cf1adcb57d543a15db44103c4801f879ceb236261b336079807dbfd2cd7a7775f166dc,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,439,0.98,,,32,32,6.5,4,0,1,12,Credit Card,City Service,41.87886558,-87.62519214,POINT (-87.6251921424 41.8788655841),41.87886558,-87.62519214,POINT (-87.6251921424 41.8788655841)
a5a598958fc186bb7c09f6eb61fc57ce7c11f898,a31d2ea87ea4f5a4793c30f84f000b0c6aad4cd956f6ea73b5628ebd509d47a7d051a0b7ccd201a550c236c42ed6e19ce5efbd4caa4bf2d1fcd77827b64b39f9,12/31/2023 11:45:00 PM,01/01/2024 12:30:00 AM,3121,7.63,,,8,39,32,0,0,1,33,Cash,Flash Cab,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.80891628,-87.59618334,POINT (-87.5961833442 41.8089162826)
a55c9af7b91b2239e4e432131062cc342f3cd2fe,dd16496faf01009b70959e7c0d5b86f9bb7f432a1771c5737f75542f886f1c16d56afc454b02a96737372836e47c77ffc172b0bdf80e13640ff9203b7d0d6dbc,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,502,0.38,,,8,8,6.25,2,0,0,8.75,Credit Card,Medallion Leasin,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111)
9f2747ed96c9f1465b93ce1f5114c907464c5d76,c26dee3edb5d4bce731d586ef40b399162a1c3a05cb5bb035e148b89a986a90612bb81bfe1745453c85fe7ae4a859e566611067e2fb730da1c1e5ce92674dd2d,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,0,0,,,8,8,3.25,0,0,1,4.25,Cash,"Taxicab Insurance Agency, LLC",41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111)
9937207e717533bf0a3e76621f06857138d6c2df,171ec426eaf8f54c5acbb7e3fde8e0683bfa6042af0b00e428e650cd9bc909011a2517de5b32358b9cb9d9ad3d5a5b26bbb14a09f8b7f5c1c9a37fa22f26f7a7,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,12,0.04,,,33,33,3.25,0,0,0,3.25,Cash,Flash Cab,41.85718386,-87.62033462,POINT (-87.6203346241 41.8571838585),41.85718386,-87.62033462,POINT (-87.6203346241 41.8571838585)
8bcad727d56e9761517e7129cc94ede7274f60fa,eed4cbab8d3be11fc5fcff8f92b3ba140f63602f2760446756572fc4262d89af90bd665c89604918253358364f34ae32113be46209e794e28390e6cca1a87768,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,361,1.16,,,28,28,20,4.1,0,0,24.6,Credit Card,Taxicab Insurance Agency Llc,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383)
8142f4b1547f4e4a683a80b5a6c7d0325ce09559,f75191fdf728d7ed7f4277ee1e39372c16658b87abc26a057a7e74b79dd5457cb375f859ea318a2aa47f19d24142bc3563cd5b8c0bfa633161570ec9b3686897,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,600,1.1,17031320100,17031320400,32,32,7.5,0,0,2,9.5,Cash,Chicago Independents,41.88498719,-87.62099291,POINT (-87.6209929134 41.8849871918),41.87740612,-87.62197165,POINT (-87.6219716519 41.8774061234)
7de7d6b1667cea33735670f88c50e9631e719f04,756721b3418247472431e2bd1022cc8ce0806af1b6b7dfeb3927318f86819fc67bf385b5829a0f13e006d05aed02020e7c1801204414d5eadf17b3e2ce71ce14,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1138,12.08,,,56,24,31.25,5,0,7,43.75,Credit Card,Medallion Leasin,41.79259236,-87.76961545,POINT (-87.7696154528 41.7925923603),41.90120699,-87.67635599,POINT (-87.6763559892 41.9012069941)
7d27d382cf9f0c93d7d0bf60c14cf7ec523624a0,be7e1462a37397809dadade8e174ef3ccbc3073294df4a0c1786610d3fbbc2cd18543a46938764af03553b89897d23ee9f88307fd485162626822cd306c2fed5,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1236,2.83,,,28,8,13,0,0,0,13,Cash,Blue Ribbon Taxi Association,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111)
716d7a0a2a097facc3f0f63e326830ecdf923d0a,2d72c5e6313ad93f663008a55045cad0c76164b057dcb756f23448dcfc082f616d8626020794704f296e6ad06f65837ff799930aa961729096167c5ef8612663,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,649,2.57,17031833000,17031330100,28,33,10,1,0,1,12.5,Credit Card,City Service,41.88528132,-87.6572332,POINT (-87.6572331997 41.8852813201),41.85934972,-87.61735801,POINT (-87.6173580061 41.859349715)
6f9899fa6b248a960572d5442018da559c192adb,90a7cf3946c408e70e8d64b08f2bc6819ae5de6159ecef3460c5287031148a66c4c0d4b6b6c53f13919fcb785db502dcd99c94fb60daa9ac6b338f01ed8c3a2b,12/31/2023 11:45:00 PM,01/01/2024 12:15:00 AM,940,5.02,,,1,14,15.75,0,0,0,15.75,Cash,Flash Cab,42.00962288,-87.67016686,POINT (-87.6701668569 42.0096228806),41.968069,-87.72155906,POINT (-87.7215590627 41.968069)
6b1b21ca32da77c68ee5d8816194ac27d9206082,38f6145c9a2b848dc1baa16fd91087e606b12bcb8757a9eb003dfab2c031fcaeb931c1ae6b486fab5f1c21037f33a187d1cb97080f4334a63f7ce0713d0f47b4,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1260,2.8,,,8,28,13,0,0,0,13,Credit Card,Taxi Affiliation Services,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383)
6722a6b9bd13d58f95c3914e12e1ef8b6b48a507,fc9af5a263f70826b274b29067232130b35f23b91479bb66a0655224a22b586ae2c4f88090c3de82a4f428726dd5018b74b0c84627b9e2cf57ee329c5d794575,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1143,1.12,17031320400,17031081500,32,8,10,0,0,0,10,Cash,City Service,41.87740612,-87.62197165,POINT (-87.6219716519 41.8774061234),41.89250778,-87.62621491,POINT (-87.6262149064 41.8925077809)
6138f0b3dff7fc33cd748727eb6714535a747657,35467b44491f6f51eaa0f4fb1cd65e4c23117aa268d9dd52d88a484194323088fc5a0d30455d37c6bced24218c2ed42b421d7260c119161f29b8381bd11b1784,12/31/2023 11:45:00 PM,01/01/2024 12:45:00 AM,3259,1.97,,,28,8,16,3.7,0,2,22.2,Credit Card,Sun Taxi,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111)
6132f6fb53c7329ed2e12fef54749d2ffc3d4d2f,b71c6761efe32829e7e453b0c6fcb78a456a7d83c720c746ac0575025dd8c5e3cd6b554288cf71419c89931c34166201ab5f47fe928d5d18e377bad66b8750fc,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1308,4.87,,,8,23,16.25,0,0,0,16.25,Cash,Flash Cab,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.9000696,-87.72091824,POINT (-87.7209182385 41.9000696026)
5f54dc81353c871c63b217a7d117c478dadc3a4b,083b7260314e48be5e10a9191da36fb2c0974b91499a5445d8a895ce901d4458b2a95e4fda48ae1ab55dfac3302268fbf967709c30ef58135041ce5d7d844065,12/31/2023 11:45:00 PM,01/01/2024 12:15:00 AM,1440,4,,,28,7,15,5,0,0,20,Credit Card,Taxi Affiliation Services,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.92268628,-87.64948873,POINT (-87.6494887289 41.9226862843)
5e3f05fac03791828973a9d5e273d756478e76a5,ae61536025042a43c682f2450eaa073da8c7a7f736aec5de1dde1d7e0e2c6be21402ea0d779c9b079b91c58fdccc9091ce99dbf01dbf8de1a81648a34c1f267b,12/31/2023 11:45:00 PM,01/01/2024 12:30:00 AM,1980,0,,,8,,43.75,2,0,7,52.75,Credit Card,Choice Taxi Association,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),,,
5d78f62496278dfb2b96a1c2c6ac428f09ea1ffc,6c6606251e8d2b1609f34d755bf884c4d972ab44b47bd75faa7e533a102e1cc2eed88f9d8d25cda28aaf89c678a3b2d17640c226cbb20375fd3f79685b719945,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,65,0.04,,,32,32,3.5,0,0,0,3.5,Cash,Flash Cab,41.87886558,-87.62519214,POINT (-87.6251921424 41.8788655841),41.87886558,-87.62519214,POINT (-87.6251921424 41.8788655841)
5bf80daf02fc3a3bab75740a1f72ebc09d4b0fe2,4cebb9edbffeb3a0eace8cccef967730a62f5a978869e216e7855270c48891c6ba7575d0ffe0fc7e5347a411f4b4149a76bb8d65812d6c0607ff975eb7c7f566,12/31/2023 11:45:00 PM,01/01/2024 12:15:00 AM,1323,12.94,,,76,,33.25,5.89,0,5.5,45.14,Credit Card,5 Star Taxi,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),,,
5beed53a8f8bf37104223411fef93b1aa2df46e8,644680ecf5bbb5af6329b0c9d4595c39344cd6c50fababc6e2e17811d9cfe0d67ac4a8b828340bf260428913ffd4b8b82dfaa8e1e83da0e939723ee47ae034c9,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,552,0.56,,,32,32,6.5,2,0,1,10,Credit Card,Sun Taxi,41.87886558,-87.62519214,POINT (-87.6251921424 41.8788655841),41.87886558,-87.62519214,POINT (-87.6251921424 41.8788655841)
5bcabb19b28a7d07c1e114244476cba232dbfe78,171ec426eaf8f54c5acbb7e3fde8e0683bfa6042af0b00e428e650cd9bc909011a2517de5b32358b9cb9d9ad3d5a5b26bbb14a09f8b7f5c1c9a37fa22f26f7a7,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,463,3.29,,,32,35,11,3,0,0.5,15,Credit Card,Flash Cab,41.87886558,-87.62519214,POINT (-87.6251921424 41.8788655841),41.83511799,-87.61867777,POINT (-87.6186777673 41.8351179863)
54b2e6aa52ea342d65be8a7ac93a82650e781319,4ae32e2eb244ce143800e0c40055e537cc50e3358a07ce1e877bf9f91aa6c10db986c727b9d4674705f8d124a18b05a68d07d1bc8d70e95e173f77c2c0437c22,12/31/2023 11:45:00 PM,01/01/2024 12:45:00 AM,4044,1.66,,,32,8,26.25,0,0,0,26.25,Cash,City Service,41.87886558,-87.62519214,POINT (-87.6251921424 41.8788655841),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111)
4f2165874756524b46ba42d39db0f5c59e1c159d,f9f12d79733b1fa7934f8d9bd17ca1927f3c99ded1640bbd1c77ef4f0e8a5992897a445315545bedf550405a9c5bc5a9f4b8b03c7d06a1f36050a32c9733164d,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,9,0,,,33,33,3.25,0,0,47,50.75,Credit Card,Medallion Leasin,41.85718386,-87.62033462,POINT (-87.6203346241 41.8571838585),41.85718386,-87.62033462,POINT (-87.6203346241 41.8571838585)
4e366fa290c59b3d3c6ced770bc8b6b1d3519a0c,071d031c64f608418d27905c9ffe95bf52695615683d5f4e7072ed77fe2757fe623e369ce677a96e4535360841f5f1ad3f1d6de25ecb0e47e8848ec83bce4da3,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,709,1.21,17031081202,17031081700,8,8,8,0,0,0,8,Cash,5 Star Taxi,41.90278805,-87.62614559,POINT (-87.6261455896 41.9027880476),41.89204214,-87.63186395,POINT (-87.6318639497 41.8920421365)
45b165d46f064d1c685e5fa0ff222437970114f8,c1ffe6edab518145aedcfc816682cbfdcab6ecab156dc3d5b230407ef441db82ba5ad37bcea8436642d6a67a8f92e482dc4005c529ce8d2814d31a9681001bac,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,2,0,17031833000,17031833000,28,28,20,0,0,0,20.5,Credit Card,Flash Cab,41.88528132,-87.6572332,POINT (-87.6572331997 41.8852813201),41.88528132,-87.6572332,POINT (-87.6572331997 41.8852813201)
43dd2fec7bbaa6808d3f6ada656f5969b517c9ae,42560393a9c9b9ae28339f4b5aec77fd89bd49916ad54175d9ee679d69939f973c177065f2816d7990a6663a07270335a4a852190c3258497ba7978edced68c8,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,639,1.41,,,24,8,8.25,4,0,1,13.75,Credit Card,Patriot Taxi Dba Peace Taxi Associat,41.90120699,-87.67635599,POINT (-87.6763559892 41.9012069941),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111)
41bb85ba82698b51f96cebd8915b62767fd0698d,f9448164dcb56f4f31c2b2ad562f31443a01885c2bda20d6325a0747c9857007a024d358c98ed634fba5791a9dcbaf7252302b7dad34fa6bea7e625d2d185d4e,12/31/2023 11:45:00 PM,01/01/2024 12:15:00 AM,1183,11.51,,,56,38,30.25,0,0,4,34.25,Cash,Patriot Taxi Dba Peace Taxi Associat,41.79259236,-87.76961545,POINT (-87.7696154528 41.7925923603),41.81294894,-87.61785968,POINT (-87.6178596758 41.8129489392)
408dbbbbb5efe8825b9802e9e47b73bde2cad640,f29ed34900f8b339ab279eda0189ecae3312801dab967e2c71b537bcc8c744c8a8691d428541cc969b2eceae6fc36a8c6bfde2f469eba49c78ac96fa96665d9d,12/31/2023 11:45:00 PM,01/01/2024 12:15:00 AM,1303,10.65,,,56,28,28.75,0,0,5.5,34.25,Cash,Sun Taxi,41.79259236,-87.76961545,POINT (-87.7696154528 41.7925923603),41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383)
3e119576753a4a807e8b8702c2caa589a92c153c,f78d14baa2d1f80febaa17d73381c2eadb406cf4537522e111615ca2ccc9854f515cf1bb9dc9a9f48c4fdbf3e4e3adac120d6c6cc00dcd2aa715276b28bed0ad,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,960,2.7,,,7,8,11.5,3,0,1,15.5,Credit Card,Choice Taxi Association,41.92268628,-87.64948873,POINT (-87.6494887289 41.9226862843),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111)
3d4ad7f2659a6f86fecfee4d4f3a8559716ca894,083b7260314e48be5e10a9191da36fb2c0974b91499a5445d8a895ce901d4458b2a95e4fda48ae1ab55dfac3302268fbf967709c30ef58135041ce5d7d844065,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,0,0,,,28,28,3.25,0,0,0,3.25,Cash,Taxi Affiliation Services,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383)
36fbb628f333cf2a39d450485ea41df93d5b2554,2eda36427e0a5394e90d77488294cd75e2fd87f04acb02c2db58dcfcf473ee221e5404b47fad3df4874a934c12a36244bbadb66f265cbab0c3ff00aa25ac3ed0,12/31/2023 11:45:00 PM,01/01/2024 12:15:00 AM,1698,14.76,,,76,6,37.75,8.45,0,4,50.7,Credit Card,Taxicab Insurance Agency Llc,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),41.9442266,-87.65599818,POINT (-87.6559981815 41.9442266014)
363810b6cfd667eace3ef3266ec553a546729ff5,847cf962bd6f62040673e6c24c24940aeb2d7fdaa54677eed6a0aaa4aeef61984916b32d763b4baa6c32476531543bb77e2346cd64f505618f6b9d562243f950,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,420,0.9,17031320100,17031320400,32,32,6.25,4,0,3,13.25,Credit Card,Taxi Affiliation Services,41.88498719,-87.62099291,POINT (-87.6209929134 41.8849871918),41.87740612,-87.62197165,POINT (-87.6219716519 41.8774061234)
36323a8a14400312e7cee05020326b7bf8dc301e,624e8f2a6af3b7f032d3c40d6f925f6fc5f0bf6a358ecc7d01503a55553689cf6b36732e7d972a664d2d55e928baadb5d5d387884a6b0cf2701453f45c14a7cf,12/31/2023 11:45:00 PM,01/01/2024 12:15:00 AM,1440,15.4,,,76,8,39,10.85,0,4,53.85,Credit Card,"Taxicab Insurance Agency, LLC",41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111)
327fa02e9cb1cc29e7898cf98830f6ade295f9e9,15ddbeeb791d41c7683b885617281c0b548544f189ee3630ea6205078abf793173f13acb37440222ebe2a7a3b701fdfca26b4a2d5d75921a3218ad63ab23aa3a,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1080,8.1,,,56,,22,7.8,0,16.5,46.3,Credit Card,Taxi Affiliation Services,41.79259236,-87.76961545,POINT (-87.7696154528 41.7925923603),,,
32747746b9fd2ac09476eabaac05c588f4f4cb83,e0e1f19080d131afa810280c286bc1f57e78b48fe55e992ac816f33602973ff890905d82df0bec55f791ff66f6f3d7d366281cdb4271de2f2d2acb047b743f32,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,358,0.61,17031081202,17031081500,8,8,5.75,0,0,1,6.75,Cash,Star North Taxi Management Llc,41.90278805,-87.62614559,POINT (-87.6261455896 41.9027880476),41.89250778,-87.62621491,POINT (-87.6262149064 41.8925077809)
2ab8133db10a059ff43e2abddaa7e19c20352451,c19109878e8ba25e09c0e464f8972f146c9d07502de920483fdbf2ef6686a35003a397903f3c330b3ee8f14feb74f29cb8a294bd950e9af6fed94b9dbc267aae,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1003,10.53,,,76,21,27.5,0,0,5.5,33,Cash,Flash Cab,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),41.9386662,-87.71121059,POINT (-87.7112105933 41.9386661962)
23a321e48c465182b749d4e3d6fb901b39a28c36,c09f5ee2dc22a2a3c342dd27432eb0fe98506ef3698f5b2e066d6c56fd7da673e58d85db53894ac092f9eeae70bac42cbb470ffff8cedadee60cf639c49fc711,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,6,0,,,3,3,40,0,0,0,40.5,Credit Card,Sun Taxi,41.96581197,-87.65587879,POINT (-87.6558787862 41.96581197),41.96581197,-87.65587879,POINT (-87.6558787862 41.96581197)
21ca9b2d87a053138fe98c5ce8a3152ae752c945,e7f8c9242fc38babca76de5c34b1e59b9b7ae3ff40812cb34a7374980b9cfb20213b8cce120d9bf339e7974754eec9bd823adab5f83852410c0af0b1c0a7b6ec,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,11,0,,,8,8,10,2,0,0,12.5,Credit Card,Taxicab Insurance Agency Llc,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111)
1f99d4a620dd942bfe2e98dc274214751258bc9c,8eca35a570101ad24c638f1f43eecce9d0cb7843e13a75f0af0c911c3e31ddec549c4808e216bcf31634542025c1e7de2442b92d5d7d73463c4e05fd959e47b4,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1004,5.31,,,8,5,17,4.4,0,0.1,22,Credit Card,Sun Taxi,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.94779159,-87.68383494,POINT (-87.6838349425 41.9477915865)
1d518052b3bcea69bdfec3508886d7551406202d,ea8e6df913a36562d8eddf662abe7722f4c0dc08527e9819364aed7a595eb61abb3728e49185db0616deb070840f410f6672275961076b6fcacc6bdfcf9edcf1,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,1080,8.9,,,8,40,24.5,0,0,0,24.5,Cash,Taxi Affiliation Services,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.79235722,-87.61793138,POINT (-87.6179313803 41.7923572233)
1adedef3b9733f6a1859137ce37d8c685ad36cea,7d2e7cdd59237335e96b9b1a897a5e48cb4df467e6c09242b1e9461256f36aa4f9ab9649279f19ab5c8ad32ad7eb683800a0fd566f3f4fbd48323bab81908955,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,121,0.45,,,8,8,4.25,2,0,0,6.75,Credit Card,City Service,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111)
1686a96446c079079e6b53574c3d4f78da6fcfdc,64b71bd4e488e9c5571cdfcca045e7cb7a4abb0931f17b9689b4049c2a71df3161c216d73fd2cb53ba148afc4bb00df5bcdc9790cd679e9000319c12e37217e3,12/31/2023 11:45:00 PM,01/01/2024 12:00:00 AM,954,0.41,17031081800,17031081700,8,8,8.5,0,0,1,9.5,Cash,Flash Cab,41.89321636,-87.63784421,POINT (-87.6378442095 41.8932163595),41.89204214,-87.63186395,POINT (-87.6318639497 41.8920421365)
0353da5e93e2f5f973dc685d76fcd15f6bc0256e,ad4b1730fcbfdb84e41313179a688924012db322823f487d70ffcdbf1fa0e9ec11c35045af7e7cf561db41f5a46939ab7ea0565dc6fa26a0d14f68f6f568b92e,12/31/2023 11:45:00 PM,12/31/2023 11:45:00 PM,12,0.57,,,24,24,4.5,0,0,17,21.5,Cash,5 Star Taxi,41.90120699,-87.67635599,POINT (-87.6763559892 41.9012069941),41.90120699,-87.67635599,POINT (-87.6763559892 41.9012069941)
ef9aabfd57aa87f78421e37dcc6225790e777cb6,093e9e4c05ea53bf75c51763839d5f5bd5d1785c11ee5ec5e805c14bcb833c9fbfd81ab2b7874a85cba14046e54062335b2221738a0bb0bf1ddcfe83a7efa382,12/31/2023 11:30:00 PM,01/01/2024 12:00:00 AM,1920,3.76,17031081201,17031330100,8,33,18,0,0,0,18,Cash,City Service,41.89915561,-87.62621053,POINT (-87.6262105324 41.8991556134),41.85934972,-87.61735801,POINT (-87.6173580061 41.859349715)
6ede890339b9db28cce204e37f36a312c2f073d1,3f46ef398d3308fb9794b8c5de450a88439d16c47b77b79398f0e84b804e7aad4789cb5ee08f74c8b7f89a444653706802a31cfdc8b99d3867a16794641fb759,12/31/2023 11:30:00 PM,12/31/2023 11:30:00 PM,56,0.03,17031980000,,76,,3.5,0,0,0,3.5,Cash,Sun Taxi,41.97907082,-87.90303966,POINT (-87.9030396611 41.9790708201),,,
6be0efd956926d6600d23f45470d638f3c5c01c3,c797f1560410b9db343567ea7c8e4095f66ceb65800fa466623d4695efdf3151679fb9bfe88ee18d47096e518c23d9c517e741de11df233e4c4bc11da8c3d8b1,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,578,2.63,,,4,77,9.75,4,0,0,14.25,Credit Card,Flash Cab,41.97517094,-87.68751552,POINT (-87.6875155152 41.9751709433),41.9867118,-87.66341641,POINT (-87.6634164054 41.9867117999)
fc955fb2be6161f771c45ab35fd37b08e13dc1f6,d461dc72b7a599bfba3f33fae867f5530e0c5aa5c200d89b4a5cbd270da1eba6488b76e3ce70a8371b8242f3529dd35a1230f16dc7e7bf2626243840f6261b97,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,1046,0.61,17031320100,17031839100,32,32,9.25,2,0,0,11.75,Credit Card,Taxicab Insurance Agency Llc,41.88498719,-87.62099291,POINT (-87.6209929134 41.8849871918),41.88099447,-87.63274649,POINT (-87.6327464887 41.8809944707)
ed59a2b85933b8086d71aa55c04f85bbfa3f37c6,698ec513d27602fcd211bb62440a555a3f23bebbe3b2a1ec9ba6466a63bab46628c6dc1622de7c3dcbe4b0f98a7048c9fb07ae9c8a1572f432db4df68b1a4803,12/31/2023 11:30:00 PM,01/01/2024 12:00:00 AM,1504,13.51,,,76,,34.5,8.4,0,7,50.4,Credit Card,Taxicab Insurance Agency Llc,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),,,
e9dfed1215cf9b95d528aabfdd3cab775b255913,0bea3de3c36237d68b009b24ee3db86c78e9e618a73a3b5776e5f4bba06775f91b3520db910d24b97d577e57c4372f5d9d2eb58d338f3a0add0e37c0f71f6701,12/31/2023 11:30:00 PM,12/31/2023 11:30:00 PM,660,0.9,17031081700,17031081201,8,8,7.25,0,0,0,7.25,Cash,Taxi Affiliation Services,41.89204214,-87.63186395,POINT (-87.6318639497 41.8920421365),41.89915561,-87.62621053,POINT (-87.6262105324 41.8991556134)
e9b1cfd8bc49629663f84f697badf17a88b7ab1f,bb4e75d3065311c33024a434640731c43fd2cf9e4482eb9e17cbf9f0ff0ed005455ffe22797df66b7467489a738e7be52c5983e16615b31c7c1d6af3ee0eb965,12/31/2023 11:30:00 PM,01/01/2024 12:15:00 AM,1920,1.2,,,24,74,50,0,0,0,50,Cash,Taxi Affiliation Services,41.90120699,-87.67635599,POINT (-87.6763559892 41.9012069941),41.69487897,-87.7131925,POINT (-87.7131924966 41.6948789661)
e92cbef122dee337d7502c7177916016d36e964b,847cf962bd6f62040673e6c24c24940aeb2d7fdaa54677eed6a0aaa4aeef61984916b32d763b4baa6c32476531543bb77e2346cd64f505618f6b9d562243f950,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,720,1.4,17031081700,17031320100,8,32,8.25,4,0,1,13.25,Credit Card,Taxi Affiliation Services,41.89204214,-87.63186395,POINT (-87.6318639497 41.8920421365),41.88498719,-87.62099291,POINT (-87.6209929134 41.8849871918)
e8473ad9fa9148bc0feaccfe68caaeef0ca1f648,259d38cfdbc9ac6f9bb01f0df740e0ddf4a631a70bbdd6525862b20b7ed0e0554dbbde64c2955b8b6f41468c8970e5507490db36348f884461783472621bda08,12/31/2023 11:30:00 PM,12/31/2023 11:30:00 PM,480,1.5,,,8,8,7.5,0,0,1,8.5,Cash,"Taxicab Insurance Agency, LLC",41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111)
e1968da25611d8cee47d5088502f9ab1c76877c8,56a1119c6ca57e39525cf06829f9ecff553cf4b5ac24821259d086c8ab30406ec45ae77335c646417897d2f4916479c3ed8b6313c2ccb9fb3fc248a4c3387800,12/31/2023 11:30:00 PM,12/31/2023 11:30:00 PM,4,0,17031081403,17031081403,8,8,15,0,0,0,15.5,Credit Card,Medallion Leasin,41.89092203,-87.61886836,POINT (-87.6188683546 41.8909220259),41.89092203,-87.61886836,POINT (-87.6188683546 41.8909220259)
e1739faf183448c03ab821871c96de984ace8697,552720f76dd5338d0cf254f8eb4045839a5501e095a0d34fee849df1633dce909ed9b7e001b6e904f64f5b235fc56377ef450dd8c29f16fcbd7a7c2116386654,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,1048,0.88,17031081202,17031081500,8,8,9.5,0,0,1,10.5,Cash,Choice Taxi Association Inc,41.90278805,-87.62614559,POINT (-87.6261455896 41.9027880476),41.89250778,-87.62621491,POINT (-87.6262149064 41.8925077809)
e151cf39ae70e33ac5df78ac76ca2c3706216321,913c95ba782fa447b7c55fbfc38d040907d13e7ddf7282a75fe448d2a25082dcdab927cd930805ea14e62bc534d5288669b15f73751bd43a746cc9e3bbddb2f4,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,830,2.66,,,7,28,10.5,0,0,0,11,Credit Card,Flash Cab,41.92268628,-87.64948873,POINT (-87.6494887289 41.9226862843),41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383)
de5e1fc2ebc09d6c56a83685e61b15d582059d2b,8be2c5887fd81a4918e0464359436d6fc5ed1dbbe4f5b0317403a4ead72c95b37979d3534c5ede73a29ceace53bdc820f692587ba4a88f651de93b329e4cf2f8,12/31/2023 11:30:00 PM,01/01/2024 12:00:00 AM,1320,16.3,,,76,28,40,8.9,0,4,52.9,Credit Card,Taxi Affiliation Services,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383)
dd01d2cbf95e044799a49c5988de327fd0b4ed2f,b875e9e053d893ee490e723c96773ed5f81c0a2339545f941b006167253d2bd537c68266a6c87ecda89948c234e7ae93ae51869767a4e8345492b407ab62e424,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,637,3.96,,,8,6,13,4.35,0,1,18.85,Credit Card,Taxicab Insurance Agency Llc,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.9442266,-87.65599818,POINT (-87.6559981815 41.9442266014)
daed3f1e3c0cd866cca1f50fe0513d250ea80eff,f9bc93a0ba6b1f18c9709a96c99bb9c5a99054b1711f80ddfe986a0f78a02470f146c48fd7d66bf74fe374f53d032676cb2fb7871afce4314dfdac97d4f22d32,12/31/2023 11:30:00 PM,01/01/2024 12:00:00 AM,1674,8.3,,,28,77,25.25,6.44,0,0,32.19,Credit Card,City Service,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.9867118,-87.66341641,POINT (-87.6634164054 41.9867117999)
d949547c3f9f0bbce64bce18b50cca6df60f88f2,b52493d43f7de565ab5eaaa0b1238709ac2073a9cdd626a411f99151188aa290435bada1d7c0119f6423891cbc9c3ce5c9ddd4ed068bca8e8aaf75cedacb9f0d,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,900,7.9,,,76,12,22,5.3,0,4,31.3,Credit Card,Taxi Affiliation Services,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),41.99393013,-87.75835359,POINT (-87.7583535876 41.9939301285)
d45e012bbd6fffa5ffa8aba12b6d61961c89e9e0,73052f4ccaf4e0fa9178722e491f8e5eda869f56e08aa4d659ef38139d36bd69df925ac00c96564af9fc30db0c616fb4c9312cb3ba36fe8c8cdd470750d4681d,12/31/2023 11:30:00 PM,01/01/2024 12:00:00 AM,1320,0,,,56,28,25.5,5.02,0,4,34.52,Credit Card,Taxi Affiliation Services,41.79259236,-87.76961545,POINT (-87.7696154528 41.7925923603),41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383)
d12be01c208e273389ccc1f306cbd9cc98bfc73d,f6aac57dbd69c58200d6fb22bbffe1343ca6ea5eece073d452f97009803408626e6357e405c35bffde1495f078c83419ca0378e26d8dca04c6b81644638cefc9,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,1260,0.2,,,6,8,16.5,0,0,0,16.5,Cash,Taxi Affiliation Services,41.9442266,-87.65599818,POINT (-87.6559981815 41.9442266014),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111)
d115aa399492b5b2bd4faed5d6c4fd36122918aa,24d4c5e51d147aecbb7c4a1ad70c38dbc05c7b4485f6de4b41fee5ea270228859c19c4160b101df87b889acc5cc7ab49b5e0491676026518677d34ea81f76591,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,1038,10.24,,,76,14,27,9.75,0,5,42.25,Credit Card,Taxicab Insurance Agency Llc,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),41.968069,-87.72155906,POINT (-87.7215590627 41.968069)
cee99899eff1d446626df83a97b5b5f0571c7ec4,7d8179131ea9952793af4cda8635e94b56c2b92d3c376cd92517f7319ec4a3031207af4d7b8165367e1f8a185275814ab89c26ace551ac3bf96a04ea174371c1,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,696,1.22,17031081500,17031833000,8,28,8,2,0,0,10.5,Credit Card,5 Star Taxi,41.89250778,-87.62621491,POINT (-87.6262149064 41.8925077809),41.88528132,-87.6572332,POINT (-87.6572331997 41.8852813201)
ce3d8fe7f1f0906a2502c728358705f6f547d872,6898e40854937399e0ef25dad63740d21b20593439090721b2f747b039ab24c5e74ebda726b98515a4b4c6b7dd9f87717cfc7c1b52e3a8b4d52245214f09d229,12/31/2023 11:30:00 PM,12/31/2023 11:30:00 PM,16,0.02,,,28,28,35,5.32,0,0,40.82,Credit Card,Flash Cab,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383)
ce0f3501cd03c9e1795241db4da2a1285559f906,1de191ccc486f8d0e0e6b25a03d592e58ed4511cfed79e912c895802cb808ea9a2d609cc77a536d7ad6431160a92d39dc6b79ec88381059d540f95215364582e,12/31/2023 11:30:00 PM,01/01/2024 12:00:00 AM,1260,11.2,,,76,2,29,5,0,4,38,Credit Card,Taxi Affiliation Services,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),42.00157103,-87.69501259,POINT (-87.6950125892 42.001571027)
cc3f3f6214a8b4ad9f15f472e0ea734b441728fa,599e7935e8f7321862152296420d8552c36d7fe97517f0bec1048c18ed7f2a434e06c55f5e16e5ec5d1da15f02eb49079258d610b196bd9eb0cf4183166878e4,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,776,4.3,,,56,30,13.75,4.81,0,5,24.06,Credit Card,Flash Cab,41.79259236,-87.76961545,POINT (-87.7696154528 41.7925923603),41.83908691,-87.71400381,POINT (-87.714003807 41.8390869059)
cb026ca7cca9c89c7bf96c3efcd12376b2800fc3,4477f5eda3c0c9379d7526db1b5029184a7d75a2adcad3b338b20c83f351865360b02546bc50125c663edc0ed86b206261dc50f7f002199e4d0880802c51311d,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,1028,5.45,,,8,6,17,4.38,0,0,21.88,Credit Card,Medallion Leasin,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.9442266,-87.65599818,POINT (-87.6559981815 41.9442266014)
cab8410b2d60210e11fa09bc929a1a6ba0696084,e533bfdc483206f9c02c1c879a118d88f0a3ca1cd2703f3cf88e318716bbbb0c71d5f1c5f86b042b4ee1a06dbc750fa840acec0ebaf5fc1d90edbdc215114a1d,12/31/2023 11:30:00 PM,01/01/2024 12:00:00 AM,1736,17.38,17031980000,17031081201,76,8,53.94,0,0,0,53.94,Cash,Flash Cab,41.97907082,-87.90303966,POINT (-87.9030396611 41.9790708201),41.89915561,-87.62621053,POINT (-87.6262105324 41.8991556134)
c38ed8ec5467492783aed71363709b87a31ac8d9,a62df4e9bfec5f3babb7922b1346263cce5c3116fa5fa3465e4845b94774ef86b30bb243f80f396cf2211d7e3309028193f159567ee341b44d39dc3a1f5495f9,12/31/2023 11:30:00 PM,01/01/2024 12:00:00 AM,1204,4.79,,,6,15,15.75,0,0,0,15.75,Cash,Medallion Leasin,41.9442266,-87.65599818,POINT (-87.6559981815 41.9442266014),41.95402765,-87.76339903,POINT (-87.7633990316 41.9540276487)
bfbc39a914248481c0b5c4899d1d0ca4f54f9851,d2a9362483decbe7b2d28d38ff371f05fafd542a60e8c9e4d5e3150e2c0b41b9e2b69e561807859b9ccb8c0f27922c183460b9f858df03f53b9656b8b829ceb1,12/31/2023 11:30:00 PM,01/01/2024 12:00:00 AM,2168,23.1,17031980000,17031839100,76,32,57.5,0,0,5,62.5,Cash,Flash Cab,41.97907082,-87.90303966,POINT (-87.9030396611 41.9790708201),41.88099447,-87.63274649,POINT (-87.6327464887 41.8809944707)
bdb3a185915415969458eeaf805d9a0012252754,3b95cedc13d4a99243e1974616a6a25267c25878336faa586d4372370c847c3618753718dca887e3713efb476a5af11bc5f5a86c9785c9f749ca45f3f8be4764,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,480,0,17031839100,17031833000,32,28,7.5,0,0,0,7.5,Cash,Taxi Affiliation Services,41.88099447,-87.63274649,POINT (-87.6327464887 41.8809944707),41.88528132,-87.6572332,POINT (-87.6572331997 41.8852813201)
bcab4490118535b7135fe1394c37507df7f6ad90,3665a72ee495b03f4dae72307dc6e5e58e21518f77d8e67dcd386c3b9daa1a0db86555cef4a877234542af8d1c0da6fa7a28a4e0e643e382236470d569d78668,12/31/2023 11:30:00 PM,01/01/2024 12:00:00 AM,1800,14.6,,,76,,37,8.7,0,6,51.7,Credit Card,Taxi Affiliation Services,41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),,,
b8f99264a52fe95e5160333451d51cd83f3b34c8,924ad289d7377302678c3954095a96778a3a5b2a9a2a69d5335f59ff00e672ea71a0d95fe058ecfa5d53fb86dc1eba63a1af3e51ab33ec04e8b5a679c91f564b,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,1380,0,,,76,24,41,0,0,5,46,Cash,"Taxicab Insurance Agency, LLC",41.98026432,-87.9136246,POINT (-87.913624596 41.9802643146),41.90120699,-87.67635599,POINT (-87.6763559892 41.9012069941)
b819f56e53b38f9d75067716f5701a3bdbae8761,422aa525858cddde977f39fa4e58947555918726746ebd72be48d2a2d09af86e2b5e5318fea36ecc84de5b6af8354307064e73672f67e4bc907dcabc21e61c09,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,1020,0.2,,,28,7,12.75,0,0,0,12.75,Cash,Taxi Affiliation Services,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.92268628,-87.64948873,POINT (-87.6494887289 41.9226862843)
b1a91a0cbb11e273aa8fe96eaf32cb26389570cf,c0d525ee45b1b77f1fcc69c7c56ff91661795d15482cc46a75ca8164ea25736a32169b1ba73fb5eee3ee98e629942c90ed23a5998f29bfe050afd3a08608a9a9,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,1140,0.1,,,32,8,11.5,0,0,2,13.5,Cash,Taxi Affiliation Services,41.87886558,-87.62519214,POINT (-87.6251921424 41.8788655841),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111)
aa31ae6e712e0a21598087799a7439a280301056,82bc059c3b13e97341f941d60f772ae9f83687498e91f7c399644ec42449cced734834174cb0a29955229b910c3c9810dc67997226ce38a8600bf7f24d149423,12/31/2023 11:30:00 PM,12/31/2023 11:30:00 PM,5,0,,,32,32,25,5.1,0,0,30.6,Credit Card,5 Star Taxi,41.87886558,-87.62519214,POINT (-87.6251921424 41.8788655841),41.87886558,-87.62519214,POINT (-87.6251921424 41.8788655841)
a96c2ef996a9458f48eecfe4f62e2fcb0790cb9d,e8d374b4e7bc344add5893f1a1ae3b611823439ac1caf06087c4bf2cb6fe114201a38c49646f2851d86f9e73c21c1e99f29903cceb84612fd1118e3224528b2b,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,720,0.4,,,,,7.5,0,0,0,7.5,Cash,Taxi Affiliation Services,,,,,,
a4d44dd31babbf742d35571b483e2f8ab7f5256a,0cae7ec64456b1830bd58df1991f046410f5506cf28b3aa16b6d5c4940b44ff0ca069324233093161b43d212c2c5eac61536cfa6e3284117bb62ee4105e945b1,12/31/2023 11:30:00 PM,12/31/2023 11:30:00 PM,0,0,,,77,,3.25,0,0,0,3.25,Cash,Taxi Affiliation Services,41.9867118,-87.66341641,POINT (-87.6634164054 41.9867117999),,,
a2136304c06eeb1897684e0402905d1d2b528cc8,42560393a9c9b9ae28339f4b5aec77fd89bd49916ad54175d9ee679d69939f973c177065f2816d7990a6663a07270335a4a852190c3258497ba7978edced68c8,12/31/2023 11:30:00 PM,12/31/2023 11:30:00 PM,239,0.7,,,28,8,5.25,2,0,1,8.75,Credit Card,Patriot Taxi Dba Peace Taxi Associat,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111)
a1383cbd5fab084a75d9a0c6302d33e3cb6104d3,b4ac2893286a7c3a55df851a3732ea65d7fb82e1da7a19f728a71651761babfd88544152301073319650be263fd4e1aabc072601e6f452daf22ceb764f8d70d5,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,475,0.45,,,8,8,6,0,0,1,7,Cash,City Service,41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111),41.89960211,-87.63330804,POINT (-87.6333080367 41.899602111)
9f072b4e70e16ebb84b0a2f6ff718150ecc3e345,00f4b381570486f8575cbaa57ed41f116ed2e1f9d85f73bb2f6dba13a72541761d2ad4cb1727990d97795a2b0bdd99f0e4a8826245c81dac443cebb1c19b26fb,12/31/2023 11:30:00 PM,01/01/2024 12:15:00 AM,2460,1.1,,,56,3,48.25,16.4,0,6,70.65,Credit Card,Taxi Affiliation Services,41.79259236,-87.76961545,POINT (-87.7696154528 41.7925923603),41.96581197,-87.65587879,POINT (-87.6558787862 41.96581197)
9ca870a06a41bcf8a0472890df4d404d7d592d5d,42e3ec7750e4be6e56c47bcdefe5cb86ddb0d0c65bcf4d09773512b3e854ed08adeacdad835a4e92a8ca871021858984bb70a72c1dc17d22b49d2f664a6e0fd2,12/31/2023 11:30:00 PM,01/01/2024 12:00:00 AM,1399,16.2,17031980000,17031839100,76,32,40.25,0,0,29,69.25,Cash,Taxicab Insurance Agency Llc,41.97907082,-87.90303966,POINT (-87.9030396611 41.9790708201),41.88099447,-87.63274649,POINT (-87.6327464887 41.8809944707)
97d0f9bb2bc7aed4e8c84da3444743f9f1256d32,f1c4fb891f4812fb2865e801d2185b401283b34401b71f25cafc8b108f48241363276826a3fe8f4830d1979de0179f5850a26e115de686d6af99b79e66218656,12/31/2023 11:30:00 PM,01/01/2024 12:00:00 AM,1680,0.6,,,28,77,30,5,0,1,36,Credit Card,Taxi Affiliation Services,41.87400538,-87.66351755,POINT (-87.6635175498 41.874005383),41.9867118,-87.66341641,POINT (-87.6634164054 41.9867117999)
97059ca7943e828e9b3b5da926d6f27d6ddc9f30,f81c929ea7d9107e6de8bd7ee335f42563b3413e967e98288480648a66455138dcdcde8b46b353ca4d6c287be49cd3087636ba13de6b7db6db3854c2ac8a157f,12/31/2023 11:30:00 PM,12/31/2023 11:45:00 PM,240,0.9,,,7,7,5.25,0,0,1,6.25,Cash,Taxi Affiliation Services,41.92268628,-87.64948873,POINT (-87.6494887289 41.9226862843),41.92268628,-87.64948873,POINT (-87.6494887289 41.9226862843)

二、实验步骤

(一)实验题目:基于SVM进行分类预测

程序输出:

============================================================

基于SVM进行分类预测

============================================================

(二)加载CSV文件

数学模型:输入数据矩阵$X \in \mathbb{R}^{n \times d}$和标签向量$y \in \{-1,1\}^n$

  1. 筛选数据:根据CSV文件(文件中一共有2002行,23列数据,不能完全满足实验要求,因为CPU跑10万条的数据集效率很低,运行时间太长,感觉等不到运行结果),数据类型的输出如下:

步骤1: 数据加载

文件 linear.csv 原始形状: (2001, 23)

前几行数据:

                                    Trip ID  ...            Dropoff Centroid  Location

0  011106b6114f83af0c17aace3867a464a7fc742b  ...  POINT (-87.6262149064 41.8925077809)

1  e9a66ddcc78cfd79f419165314cbe5ee380f16c3  ...                                   NaN

2  e765192268db3480b5d9bd0443f7ce7fd5ba047d  ...  POINT (-87.6559981815 41.9442266014)

3  c6510d4f82541cfacf8c20cab44fbb7c0b2c5efe  ...                                   NaN

4  f9445eed26da9a0eff247350df942616cb51e764  ...  POINT (-87.6559981815 41.9442266014)

[5 rows x 23 columns]

数据类型:

Trip ID                        object

Taxi ID                        object

Trip Start Timestamp           object

Trip End Timestamp             object

Trip Seconds                  float64

Trip Miles                    float64

Pickup Census Tract           float64

Dropoff Census Tract          float64

Pickup Community Area         float64

Dropoff Community Area        float64

Fare                          float64

Tips                          float64

Tolls                         float64

Extras                        float64

Trip Total                    float64

Payment Type                   object

Company                        object

Pickup Centroid Latitude      float64

Pickup Centroid Longitude     float64

Pickup Centroid Location       object

Dropoff Centroid Latitude     float64

Dropoff Centroid Longitude    float64

Dropoff Centroid  Location     object

根据上述的数据类型的输出,我们容易发现,经度和纬度由于数据变化范围特别小,因此Python不用访问,对于非数值类型(object),根据观察可以发现Python只能处理第16列非数值类型的数据,可以采用映射的方式将Cash映射为-1,将Credit Card映射为1,第5到第15列数据是数值类型(float64),Python可以处理,因此可以筛选第5-16列数据。

2.处理数据:特征列处理函数

过程模型:对于特征矩阵的每一列 $X[:, j]$ 进行数值转换和缺失值填充:

若转换成功,则$X'[:, j] = \text{numeric}(X[:, j])$

若存在缺失值NaN,则使用均值填充:$X'[i, j] = \frac{1}{|\{k | X'[k, j] \neq \text{NaN}\}|}\sum_{k: X'[k, j] \neq \text{NaN}} X'[k, j]$

数值转换和缺失值处理

数学模型:对于向量$v$,数值转换和缺失值填充的处理过程为:

尝试将$v$转换为数值向量$v'$

对于缺失值,计算$v'$的均值$\mu = \frac{1}{|\{i | v'[i] \neq \text{NaN}\}|}\sum_{i: v'[i] \neq \text{NaN}} v'[i]$

填充缺失值:$v'[i] = \mu \text{ if } v'[i] = \text{NaN}$

标签处理函数(特殊处理标签列)

数学模型:对于标签向量$y$,我们定义映射函数$f$

对于字符串标签,$f(\text{"Cash"}) = -1$$f(\text{"Credit Card"}) = 1$

对于数值标签,$f(x) = -1 \text{ if } x \leq 0 \text{ else } 1$

对于缺失值NaN,直接跳过该样本

由于第5-16列数据可能有缺失值,异常值的情况,需要标准化和归一化进行处理。处理结果如下:

dtype: object

处理列 '特征列 Trip Seconds', 原始类型: int64

处理列 '特征列 Trip Miles', 原始类型: float64

处理列 '特征列 Pickup Census Tract', 原始类型: float64

列 '特征列 Pickup Census Tract' 中有 1287 个值无法转换为数字,将使用均值填充

处理列 '特征列 Dropoff Census Tract', 原始类型: float64

列 '特征列 Dropoff Census Tract' 中有 1337 个值无法转换为数字,将使用均值填充

处理列 '特征列 Pickup Community Area', 原始类型: float64

列 '特征列 Pickup Community Area' 中有 66 个值无法转换为数字,将使用均值填充

处理列 '特征列 Dropoff Community Area', 原始类型: float64

列 '特征列 Dropoff Community Area' 中有 317 个值无法转换为数字,将使用均值填充

处理列 '特征列 Fare', 原始类型: float64

列 '特征列 Fare' 中有 2 个值无法转换为数字,将使用均值填充

处理列 '特征列 Tips', 原始类型: float64

列 '特征列 Tips' 中有 2 个值无法转换为数字,将使用均值填充

处理列 '特征列 Tolls', 原始类型: float64

列 '特征列 Tolls' 中有 2 个值无法转换为数字,将使用均值填充

处理列 '特征列 Extras', 原始类型: float64

列 '特征列 Extras' 中有 2 个值无法转换为数字,将使用均值填充

处理列 '特征列 Trip Total', 原始类型: float64

列 '特征列 Trip Total' 中有 2 个值无法转换为数字,将使用均值填充

成功加载文件: linear.csv, 特征数据形状: (2000, 11), 标签数量: 2000

处理标签数据,类型: <class 'numpy.ndarray'>, 形状: (2000,)

标签的唯一值: ['Cash' 'Credit Card']

处理后的标签分布: -1 (Cash): 969, 1 (Credit Card): 1031

数据处理完成 - 特征维度: (2000, 11), 标签分布: 负类(-1): 969, 正类(1): 1031

成功加载CSV文件

(三)数据检查与预处理

检查数据维度与类型分布,输出结果如下:

步骤2: 数据检查与预处理

数据维度: X=(2000, 11), y=(2000,)

类别分布 - 类别(-1): 969, 类别(1): 1031

(四)数据标准化

划分为训练集和测试集,输出结果如下:

步骤3: 数据标准化

训练集大小: (1600, 11)

测试集大小: (400, 11)

(五)手动SMO算法训练

1.支持向量机原理

        支持向量机(SVM)的基本思想是在特征空间中寻找一个最优超平面,使得不同类别的样本分别位于超平面的两侧,且间隔最大。

原始优化问题:

                                                        $\min_{w, b} \frac{1}{2} \|w\|^2$

约束条件:

                                             $y_i(w^T x_i + b) \geq 1, \forall i=1,2,...,n$

为处理线性不可分情况,引入松弛变量$\xi_i$和惩罚参数C:

                                                        $\min_{w, b, \xi} \frac{1}{2} \|w\|^2 + C \sum_{i=1}^{n} \xi_i$

约束条件:

                                    $y_i(w^T x_i + b) \geq 1 - \xi_i, \quad \xi_i \geq 0, \forall i=1,2,...,n$

2.拉格朗日对偶问题

通过引入拉格朗日乘子,原问题转化为对偶问题:

                                               $\max_{\alpha} \sum_{n}\alpha_i - \frac{1}{2}\sum_{i,j=1}^{n}\alpha_i \alpha_j y_i y_j K(x_i, x_j)$

约束条件:

                                                 $0 \leq \alpha_i \leq C, \quad \sum_{i=1}^{n}\alpha_i y_i = 0$

3 。核函数定义

核函数$K(x_i, x_j)$用于在高维空间中计算内积,常用的核函数包括:

线性核:$K(x_i, x_j) = x_i^T x_j$

多项式核:$K(x_i, x_j) = (x_i^T x_j + 1)^d$

RBF核:$K(x_i, x_j) = \exp(-\gamma \|x_i - x_j\|^2)$,其中$\gamma = \frac{1}{2\sigma^2}$

Sigmoid核:$K(x_i, x_j) = \tanh(\beta x_i^T x_j + c)$

4。 序列最小优化算法(SMO)

SMO算法通过迭代选择两个拉格朗日乘子进行优化,关键步骤如下:

(1) 选择拉格朗日乘子: 选择违反KKT条件的两个变量$\alpha_i$$\alpha_j$

(2) 计算边界:根据约束$\sum_{i=1}^{n}\alpha_i y_i = 0$$0 \leq \alpha_i \leq C$

   当$y_i \neq y_j$时: $L = \max(0, \alpha_j - \alpha_i)$$H = \min(C, C + \alpha_j - \alpha_i)$

   

   当$y_i = y_j$时: $L = \max(0, \alpha_i + \alpha_j - C)$$H = \min(C, \alpha_i + \alpha_j)$

(3) 更新$\alpha_j$

                                                       $\alpha_j^{new} = \alpha_j^{old} + \frac{y_j (E_i - E_j)}{\eta}$

   其中$E_i = f(x_i) - y_i$$\eta = 2K(x_i, x_j) - K(x_i, x_i) - K(x_j, x_j)$

(4) 截断$\alpha_j$$\alpha_j^{new} = \begin{cases} H, & \alpha_j^{new} > H \\ \alpha_j^{new}, & L \leq \alpha_j^{new} \leq H \\ L, & \alpha_j^{new} < L \end{cases}$

(5) 更新$\alpha_i$

 $\alpha_i^{new} = \alpha_i^{old} + y_i y_j (\alpha_j^{old} - \alpha_j^{new})$

(6) 计算截距b:

 $b_1 = b - E_i - y_i(\alpha_i^{new} - \alpha_i^{old})K(x_i, x_i) - y_j(\alpha_j^{new} - \alpha_j^{old})K(x_i, x_j)$

$b_2 = b - E_j - y_i(\alpha_i^{new} - \alpha_i^{old})K(x_i, x_j) - y_j(\alpha_j^{new} - \alpha_j^{old})K(x_j, x_j)$

   如果$0 < \alpha_i^{new} < C$,则$b = b_1$

   

   如果$0 < \alpha_j^{new} < C$,则$b = b_2$

   

   否则$b = \frac{b_1 + b_2}{2}$

 (7)决策函数

优化完成后,决策函数为:

$f(x) = \text{sign}\left(\sum_{i=1}^{n} \alpha_i y_i K(x_i, x) + b\right)$

显示支出项两个数,权重向量和偏置项,输出结果如下:

步骤4: 手动SMO算法训练

支持向量个数: 1384

权重向量 w = [-0.1751, 0.9524]

偏置项 b = -0.2822

决策边界可视化

数学模型:根据SVM决策函数可视化决策边界

手动SMO SVM - 准确率: 0.5975, 精确率: 0.7068, 召回率: 0.4352, F1: 0.5387

(六)不同核函数比较

混淆矩阵和评估指标

数学模型:计算分类性能指标

真正例(TP):$|\{i : y_{true}[i] = 1 \text{ and } y_{pred}[i] = 1\}|$

真负例(TN):$|\{i : y_{true}[i] = -1 \text{ and } y_{pred}[i] = -1\}|$

假正例(FP):$|\{i : y_{true}[i] = -1 \text{ and } y_{pred}[i] = 1\}|$

假负例(FN):$|\{i : y_{true}[i] = 1 \text{ and } y_{pred}[i] = -1\}|$

指标计算:

 准确率(Accuracy):$\frac{TP + TN}{TP + TN + FP + FN}$

 精确率(Precision):$\frac{TP}{TP + FP}$

召回率(Recall):$\frac{TP}{TP + FN}$

 F1分数:$\frac{2 \times Precision \times Recall}{Precision + Recall}$

比较线性核SVM,RBF核SVM,多项式核SVM,Sigmoid核SVM,依次计算这些核函数的准确率,精确率,召回率,F1的值,输出结果如下:

步骤5: 不同核函数比较

线性核 SVM - 准确率: 0.9275, 精确率: 1.0000, 召回率: 0.8657, F1: 0.9280

RBF核 SVM - 准确率: 0.9300, 精确率: 0.9896, 召回率: 0.8796, F1: 0.9314

多项式核 SVM - 准确率: 0.8875, 精确率: 0.9476, 召回率: 0.8380, F1: 0.8894

Sigmoid核 SVM - 准确率: 0.7600, 精确率: 0.7857, 召回率: 0.7639, F1: 0.7746

(七)自动参数调优

网格搜索自动调参

数学模型:通过网格搜索和交叉验证寻找最优超参数

交叉验证过程:

1. 将数据分成$k$

2. 对每个参数组合$(C, \gamma, ...)$,计算交叉验证分数$CV(C, \gamma) = \frac{1}{k}\sum_{i=1}^{k}Score_i$

3. 选择最优参数组合$(C^*, \gamma^*) = \arg\max_{C, \gamma} CV(C, \gamma)$

输出结果如下:

步骤6: 自动超参数调优

对通用数据集进行调参...

开始自动调参...

自动调参失败: 'ascii' codec can't encode characters in position 18-20: ordinal not in range(128)

(八)参数对性能的影响

步骤7: 参数对性能的影响

(九)学习曲线分析

步骤8: 学习曲线分析

(十)生成3D可视化(网页版)

3D可视化

数学模型:使用PCA或t-SNE进行降维,在3D空间中可视化数据分布和决策边界

PCA降维过程:

1. 计算协方差矩阵 $C = \frac{1}{n}X^TX$

2. 对协方差矩阵进行特征值分解$C = V\Lambda V^T$

3. 选择前三个最大特征值对应的特征向量 $V_{1:3}$

4. 降维投影 $X_{3D} = XV_{1:3}$

输出结果如下:

步骤9: 生成3D可视化

(十一)创建决策边界动画(网页版)

动画可视化

数学模型:创建不同核函数决策边界的平滑过渡动画

过程:

1. 对每个核函数计算决策函数$f_k(x, y)$

2. 通过权重函数$w_k(t)$实现平滑过渡:$f(x, y, t) = \sum_k w_k(t) f_k(x, y)$

3. 使用帧序列可视化时间序列$t \in [0, 1]$

输出结果如下:

步骤10: 创建决策边界动画

(十二)生成综合性能报告(网页版)

步骤11: 生成综合性能报告

=== 所有演示完成 ===

最佳模型参数: {'kernel': 'linear', 'C': 1.0}

数据处理、模型训练、可视化和性能评估已全部完成!

三、Python代码实现基于SVM进行分类预测

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from mpl_toolkits.mplot3d import Axes3D
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, roc_curve, auc, accuracy_score, precision_score, recall_score, f1_score
from sklearn.impute import SimpleImputer  # 导入缺失值处理模块
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import seaborn as sns
import os
import warningswarnings.filterwarnings('ignore')
# 设置中文字体支持
plt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False  # 用来正常显示负号# ==================== CSV数据访问功能 ====================
def load_csv_with_specific_columns(file_paths, selected_columns=[4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], skip_header=True):"""从一个或两个CSV文件加载特定列数据参数:file_paths: str或list - CSV文件路径列表selected_columns: list - 要选择的列索引(第5,6,7,8,9,10,11,12,13,14,15,16列对应索引4-15)skip_header: bool - 是否跳过第一行返回:X: np.array - 特征数据y: np.array - 标签数据(Cash=-1, Credit Card=1)"""all_data = []all_labels = []# 处理单个或多个文件if isinstance(file_paths, str):file_paths = [file_paths]for file_path in file_paths:try:# 读取CSV文件df = pd.read_csv(file_path)print(f"文件 {file_path} 原始形状: {df.shape}")print(f"前几行数据:")print(df.head())print(f"数据类型:")print(df.dtypes)# 跳过第一行如果需要if skip_header:df = df.iloc[1:]# 检查列索引是否有效max_col = max(selected_columns) if selected_columns else df.shape[1] - 1if max_col >= df.shape[1]:print(f"警告: 文件 {file_path} 列数不足,最大列索引: {df.shape[1] - 1}")# 调整选择的列valid_columns = [col for col in selected_columns if col < df.shape[1]]else:valid_columns = selected_columns# 确保至少有两列(特征列+标签列)if len(valid_columns) < 2:print(f"警告: 有效列数不足,至少需要一个特征列和一个标签列")continue# 分离特征列和标签列feature_columns = valid_columns[:-1]label_column = valid_columns[-1]# 处理特征列(数值型)X_data = process_feature_columns(df, feature_columns)# 处理标签列(保留字符串)labels_data = df.iloc[:, label_column].valuesall_data.append(X_data)all_labels.extend(labels_data)print(f"成功加载文件: {file_path}, 特征数据形状: {X_data.shape}, 标签数量: {len(labels_data)}")except Exception as e:print(f"加载文件 {file_path} 失败: {e}")continueif not all_data:print("未能加载任何数据,生成模拟数据...")return generate_sample_data()# 合并特征数据X = np.vstack(all_data) if len(all_data) > 1 else all_data[0]# 将标签列转换为数组labels = np.array(all_labels)# 映射标签(不使用均值填充)y = process_labels(labels)print(f"数据处理完成 - 特征维度: {X.shape}, 标签分布: 负类(-1): {np.sum(y == -1)}, 正类(1): {np.sum(y == 1)}")return X, ydef process_feature_columns(df, feature_columns):"""处理特征列(数值型处理)参数:df: DataFrame - 输入数据feature_columns: list - 特征列索引返回:X: np.array - 处理后的特征数据"""# 提取特征列数据features_df = df.iloc[:, feature_columns]# 处理每一列processed_features = []for col_idx, col in enumerate(features_df.columns):series = features_df[col]# 尝试转换为数值类型并处理缺失值numeric_series, _ = convert_to_numeric(series, f"特征列 {col}")processed_features.append(numeric_series)# 合并处理后的特征列X = np.column_stack(processed_features)return Xdef convert_to_numeric(series, col_name):"""将pandas Series转换为数值类型参数:series: pandas Seriescol_name: 列名(用于调试)返回:numeric_array: 数值数组conversion_info: 转换信息"""print(f"处理列 '{col_name}', 原始类型: {series.dtype}")# 尝试直接转换为数值类型try:# 首先尝试 pd.to_numericnumeric_series = pd.to_numeric(series, errors='coerce')# 检查转换后的缺失值nan_count = numeric_series.isna().sum()if nan_count > 0:print(f"列 '{col_name}' 中有 {nan_count} 个值无法转换为数字,将使用均值填充")# 使用均值填充NaN值if not numeric_series.isna().all():  # 确保不是全部都是NaNmean_value = numeric_series.mean()numeric_series.fillna(mean_value, inplace=True)else:print(f"列 '{col_name}' 全部为非数值,使用0填充")numeric_series.fillna(0, inplace=True)return numeric_series.values, "数值转换成功"except Exception as e:print(f"列 '{col_name}' 数值转换失败: {e}")# 如果是字符串列,尝试特殊处理if series.dtype == 'object':return process_numeric_object_column(series, col_name)else:# 最后的备选方案:全部设为0print(f"对列 '{col_name}' 使用默认值0")return np.zeros(len(series)), "使用默认值"def process_numeric_object_column(series, col_name):"""处理object类型(通常是字符串)的特征列,尝试转换为数值"""print(f"处理object类型特征列 '{col_name}'")# 查看唯一值unique_values = series.unique()if len(unique_values) < 10:print(f"列 '{col_name}' 的唯一值: {unique_values}")else:print(f"列 '{col_name}' 有 {len(unique_values)} 个唯一值")# 尝试映射常见的字符串到数值result = []for value in series:if pd.isna(value) or value is None:result.append(0)  # NaN用0代替elif isinstance(value, str):# 尝试提取数字numeric_value = extract_number_from_string(value)result.append(numeric_value)else:try:result.append(float(value))except:result.append(0)return np.array(result), "字符串处理完成"def extract_number_from_string(s):"""从字符串中提取数字"""if not isinstance(s, str):return 0# 移除空格s = s.strip()# 常见的字符串到数字的映射string_to_number = {'cash': -1,'credit': 1,'credit card': 1,'debit': 0,'yes': 1,'no': 0,'true': 1,'false': 0,'male': 1,'female': 0,'high': 1,'low': -1,'medium': 0}# 检查字符串映射s_lower = s.lower()if s_lower in string_to_number:return string_to_number[s_lower]# 尝试提取数字import renumbers = re.findall(r'-?\d+\.?\d*', s)if numbers:try:return float(numbers[0])except:pass# 如果无法提取,使用哈希值return hash(s) % 1000 / 1000.0  # 转换为0-1之间的小数def process_labels(labels):"""处理标签数据,保留字符串格式,不使用均值填充"""print(f"处理标签数据,类型: {type(labels)}, 形状: {labels.shape if hasattr(labels, 'shape') else len(labels)}")# 查看标签的唯一值if isinstance(labels, np.ndarray):unique_labels = np.unique(labels)else:unique_labels = pd.Series(labels).unique()# 显示唯一标签值if len(unique_labels) < 10:print(f"标签的唯一值: {unique_labels}")else:print(f"标签有 {len(unique_labels)} 个唯一值")# 转换标签y = []for label in labels:# 对于缺失的标签,跳过对应的样本if pd.isna(label) or label is None:continuemapped_label = map_payment_label(label)y.append(mapped_label)# 输出转换后的标签分布y_array = np.array(y)print(f"处理后的标签分布: -1 (Cash): {np.sum(y_array == -1)}, 1 (Credit Card): {np.sum(y_array == 1)}")return y_arraydef map_payment_label(label):"""映射支付方式标签,保留字符串特性Cash/cash -> -1Credit Card/credit/credit card -> 1"""# 处理字符串标签if isinstance(label, str):label_lower = label.strip().lower()# 检查大小写不敏感的匹配if 'cash' in label_lower:return -1elif 'credit' in label_lower or 'credit card' in label_lower:return 1# 检查精确的匹配 (区分大小写)if label.strip() == 'Cash':return -1elif label.strip() == 'Credit Card':return 1# 其他常见字符串值elif label_lower in ['0', 'false', 'no', 'negative', 'failure', 'fail', 'n']:return -1elif label_lower in ['1', 'true', 'yes', 'positive', 'success', 'pass', 'y']:return 1else:# 对于数值标签try:num_label = float(label)# 对于明确的 -1/1 值,直接使用if num_label == -1:return -1elif num_label == 1:return 1# 其他数值使用符号规则return -1 if num_label <= 0 else 1except:pass# 默认返回值(对于无法识别的标签)return 1def generate_linear_data(n_samples=200):"""生成线性可分数据集(data1)"""np.random.seed(42)X = np.random.randn(n_samples, 2) * 2# 线性决策边界: x + y > 0y = np.where(X[:, 0] + X[:, 1] > 0, 1, -1)# 添加少量噪声noise_idx = np.random.choice(n_samples, size=int(0.05 * n_samples), replace=False)y[noise_idx] = -y[noise_idx]print("已生成线性可分模拟数据")return X, ydef generate_spiral_data(n_samples=200):"""生成螺旋形数据集(data2)"""np.random.seed(42)def spiral_xy(i, spiral_num):"""生成螺旋坐标"""angle = i * np.pi / 16radius = 2 * i / n_samplesif spiral_num == 0:return [radius * np.cos(angle), radius * np.sin(angle)]else:return [-radius * np.cos(angle), -radius * np.sin(angle)]half_samples = n_samples // 2X = np.zeros((n_samples, 2))y = np.zeros(n_samples)# 第一个螺旋 (类别1)for i in range(half_samples):X[i] = spiral_xy(i, 0)y[i] = 1# 第二个螺旋 (类别-1)for i in range(half_samples):X[i + half_samples] = spiral_xy(i, 1)y[i + half_samples] = -1# 添加噪声X += np.random.randn(n_samples, 2) * 0.1print("已生成螺旋形模拟数据")return X, ydef generate_sample_data(n_samples=200, n_features=8):"""生成常规模拟数据"""np.random.seed(42)X = np.random.randn(n_samples, n_features)y = np.where(X[:, 0] + X[:, 1] + 0.3 * X[:, 2] > 0, 1, -1)print("已生成常规模拟数据")return X, y# ==================== 数据预处理函数 ====================
def preprocess_data(X, y):"""数据预处理:处理缺失值、缩放特征参数:X: 特征数据y: 标签数据返回:X_scaled: 预处理后的特征数据y: 预处理后的标签数据"""# 1. 处理特征中的缺失值if np.isnan(X).any():imputer = SimpleImputer(strategy='mean')X = imputer.fit_transform(X)print("已使用均值填充特征中的缺失值")# 2. 处理标签中的缺失值valid_indices = ~np.isnan(y)if not all(valid_indices):X = X[valid_indices]y = y[valid_indices]print(f"已移除 {np.sum(~valid_indices)} 个标签缺失的样本")# 3. 标准化缩放特征scaler = StandardScaler()X_scaled = scaler.fit_transform(X)return X_scaled, y# ==================== 评估指标计算函数 ====================
def calculate_metrics(y_true, y_pred):"""计算评估指标"""true_positives = np.sum((y_true == 1) & (y_pred == 1))true_negatives = np.sum((y_true == -1) & (y_pred == -1))false_positives = np.sum((y_true == -1) & (y_pred == 1))false_negatives = np.sum((y_true == 1) & (y_pred == -1))precision = true_positives / (true_positives + false_positives) if (true_positives + false_positives) > 0 else 0recall = true_positives / (true_positives + false_negatives) if (true_positives + false_negatives) > 0 else 0f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0accuracy = np.sum(y_true == y_pred) / len(y_true)return accuracy, precision, recall, f1# ==================== SMO算法实现 ====================
def SMO(x, y, ker, C, max_iter, tol=1e-3):"""SMO算法实现SVM训练"""m = x.shape[0]alpha = np.zeros(m)b = 0passes = 0# 预计算核矩阵K = np.zeros((m, m))for i in range(m):for j in range(m):K[i, j] = ker(x[i], x[j])# SMO主循环while passes < max_iter:num_changed_alphas = 0for i in range(m):Ei = np.sum(alpha * y * K[:, i]) + b - y[i]if (y[i] * Ei < -tol and alpha[i] < C) or (y[i] * Ei > tol and alpha[i] > 0):j = np.random.choice([l for l in range(m) if l != i])Ej = np.sum(alpha * y * K[:, j]) + b - y[j]alpha_i_old = alpha[i]alpha_j_old = alpha[j]if y[i] != y[j]:L = max(0, alpha[j] - alpha[i])H = min(C, C + alpha[j] - alpha[i])else:L = max(0, alpha[i] + alpha[j] - C)H = min(C, alpha[i] + alpha[j])if L == H:continueeta = 2 * K[i, j] - K[i, i] - K[j, j]if eta >= 0:continuealpha[j] = alpha[j] - (y[j] * (Ei - Ej)) / etaalpha[j] = np.clip(alpha[j], L, H)if abs(alpha[j] - alpha_j_old) < tol:continuealpha[i] = alpha[i] + y[i] * y[j] * (alpha_j_old - alpha[j])b1 = b - Ei - y[i] * (alpha[i] - alpha_i_old) * K[i, i] - y[j] * (alpha[j] - alpha_j_old) * K[i, j]b2 = b - Ej - y[i] * (alpha[i] - alpha_i_old) * K[i, j] - y[j] * (alpha[j] - alpha_j_old) * K[j, j]if 0 < alpha[i] < C:b = b1elif 0 < alpha[j] < C:b = b2else:b = (b1 + b2) / 2num_changed_alphas += 1if num_changed_alphas == 0:passes += 1else:passes = 0return alpha, b# ==================== 核函数定义 ====================
def linear_kernel(x, y):"""线性核函数"""return np.inner(x, y)def polynomial_kernel(d):"""多项式核函数"""def kernel(x, y):return np.inner(x, y) ** dreturn kerneldef rbf_kernel(sigma):"""RBF核函数"""def kernel(x, y):return np.exp(-np.inner(x - y, x - y) / (2.0 * sigma ** 2))return kerneldef cosine_kernel(x, y):"""余弦相似度核函数"""return np.inner(x, y) / (np.linalg.norm(x, 2) * np.linalg.norm(y, 2) + 1e-10)def sigmoid_kernel(beta, c):"""Sigmoid核函数"""def kernel(x, y):return np.tanh(beta * np.inner(x, y) + c)return kernel# ==================== 增强可视化功能 ====================
def plot_decision_boundary_enhanced(X, y, model, title=None, ax=None, alpha=0.8,show_support_vectors=True, confidence=True,show_margin=True, point_size=60):"""绘制增强的决策边界可视化参数:X: 特征数据y: 标签数据model: SVM模型title: 标题ax: 坐标轴对象alpha: 透明度show_support_vectors: 是否显示支持向量confidence: 是否显示置信度show_margin: 是否显示间隔point_size: 数据点大小"""if ax is None:fig, ax = plt.subplots(figsize=(10, 8))# 使用前两个特征X_2d = X[:, :2] if X.shape[1] > 2 else X# 创建网格x_min, x_max = X_2d[:, 0].min() - 1, X_2d[:, 0].max() + 1y_min, y_max = X_2d[:, 1].min() - 1, X_2d[:, 1].max() + 1xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),np.linspace(y_min, y_max, 200))# 对网格点进行预测if X.shape[1] > 2:# 创建与原始特征维度相同的网格点grid = np.zeros((xx.size, X.shape[1]))grid[:, 0] = xx.ravel()grid[:, 1] = yy.ravel()# 对其余特征用均值填充for i in range(2, X.shape[1]):grid[:, i] = X[:, i].mean()else:grid = np.c_[xx.ravel(), yy.ravel()]try:# 获取决策函数值(距离超平面的距离)Z = model.decision_function(grid).reshape(xx.shape)# 预测结果Z_pred = model.predict(grid).reshape(xx.shape)if confidence:# 使用绝对值距离来绘制渐变色的决策区域abs_Z = np.abs(Z)max_abs_Z = abs_Z.max()# 创建归一化的置信度值(0-1范围)conf = abs_Z / max_abs_Z# 分别为不同类别创建颜色图cmap_blue = plt.cm.Bluescmap_red = plt.cm.Reds# 提取两个类别区域region_a = np.copy(conf)region_b = np.copy(conf)region_a[Z_pred != 1] = 0region_b[Z_pred != -1] = 0# 绘制带有渐变置信度的区域ax.imshow(region_a, cmap=cmap_blue, alpha=alpha,extent=(x_min, x_max, y_min, y_max), origin='lower')ax.imshow(region_b, cmap=cmap_red, alpha=alpha,extent=(x_min, x_max, y_min, y_max), origin='lower')else:# 简单的二分类区域ax.contourf(xx, yy, Z_pred, alpha=alpha, cmap=ListedColormap(['#FFAAAA', '#AAAAFF']))# 绘制决策边界和间隔边界if show_margin:ax.contour(xx, yy, Z, levels=[-1, 0, 1], colors=['red', 'black', 'blue'],linestyles=['--', '-', '--'], linewidths=[1, 2, 1])else:ax.contour(xx, yy, Z, levels=[0], colors=['black'],linestyles=['-'], linewidths=[2])except Exception as e:print(f"绘制决策边界时出错: {e}")# 只绘制数据点,不绘制决策边界ax.text(0.5, 0.5, "绘制决策边界失败",ha='center', va='center', transform=ax.transAxes,bbox=dict(facecolor='red', alpha=0.1))# 绘制数据点scatter = ax.scatter(X_2d[:, 0], X_2d[:, 1], c=y, cmap=ListedColormap(['red', 'blue']),s=point_size, edgecolors='k', alpha=0.8)# 绘制支持向量if show_support_vectors and hasattr(model, 'support_vectors_'):sv = model.support_vectors_if sv.shape[1] > 2:sv = sv[:, :2]  # 只取前两个维度ax.scatter(sv[:, 0], sv[:, 1],s=point_size * 2, linewidth=1, facecolors='none', edgecolors='green')# 添加标题和图例if title:ax.set_title(title, fontsize=14)else:kernel_type = model.kernel if hasattr(model, 'kernel') else 'unknown'ax.set_title(f'SVM (kernel={kernel_type})', fontsize=14)ax.set_xlabel('特征 1', fontsize=12)ax.set_ylabel('特征 2', fontsize=12)# 设置坐标轴ax.set_xlim(xx.min(), xx.max())ax.set_ylim(yy.min(), yy.max())# 添加图例handles, labels = scatter.legend_elements()class_labels = ['类别 -1', '类别 1']legend1 = ax.legend(handles, class_labels, loc="upper right")ax.add_artist(legend1)if show_support_vectors and hasattr(model, 'support_vectors_'):sv_handle = plt.Line2D([0], [0], marker='o', color='w', markerfacecolor='none',markeredgecolor='green', markersize=10, linewidth=0)ax.legend([sv_handle], ['支持向量'], loc='upper left')# 添加网格ax.grid(True, linestyle='--', alpha=0.3)return axdef create_3d_visualization_advanced(X, y, method='pca', model=None, title_suffix="",show_decision_surface=True):"""增强版3D可视化,支持显示决策边界和支持向量"""# 确保没有NaN值if np.isnan(X).any():print("警告:3D可视化数据中包含NaN值,将使用均值填充")imputer = SimpleImputer(strategy='mean')X = imputer.fit_transform(X)# 降维到3Dif method == 'pca':# PCA降维到3Dpca = PCA(n_components=min(3, X.shape[1]))X_3d = pca.fit_transform(X)title = f"PCA 3D可视化 {title_suffix}"explained_var = pca.explained_variance_ratio_axis_labels = [f'PC{i + 1} ({explained_var[i]:.1%})' for i in range(min(3, X.shape[1]))]else:# t-SNE降维到3Dn_components = min(3, X.shape[1])perplexity = min(30, len(X) // 4) if len(X) > 12 else 3tsne = TSNE(n_components=n_components, random_state=42, perplexity=perplexity)X_3d = tsne.fit_transform(X)title = f"t-SNE 3D可视化 {title_suffix}"axis_labels = [f't-SNE {i + 1}' for i in range(n_components)]# 如果维度不足3,填充零向量if X_3d.shape[1] < 3:pad = np.zeros((X_3d.shape[0], 3 - X_3d.shape[1]))X_3d = np.hstack((X_3d, pad))for i in range(X_3d.shape[1] - len(axis_labels)):axis_labels.append(f'填充维度 {i + 1}')# 绘制3D散点图fig = go.Figure()# 添加决策曲面(如果需要且模型可用)if show_decision_surface and model is not None and X.shape[1] >= 3:try:# 创建3D网格x_min, x_max = X_3d[:, 0].min() - 0.5, X_3d[:, 0].max() + 0.5y_min, y_max = X_3d[:, 1].min() - 0.5, X_3d[:, 1].max() + 0.5xx, yy = np.meshgrid(np.linspace(x_min, x_max, 30),np.linspace(y_min, y_max, 30))# 网格点在原始空间中的坐标if method == 'pca':grid = np.c_[xx.ravel(), yy.ravel(), np.zeros(xx.size)]# 计算第三维的值,使得点在决策边界上# 这里简化了计算,实际应用可能需要更复杂的方法z_vals = []for i in range(grid.shape[0]):# 尝试找到在决策边界上的z值z_test = np.linspace(X_3d[:, 2].min(), X_3d[:, 2].max(), 5)decision_vals = []for z in z_test:point_3d = np.array([grid[i, 0], grid[i, 1], z])try:# 将3D点投影回原始空间point_orig = pca.inverse_transform(point_3d)decision_vals.append(model.decision_function([point_orig])[0])except:decision_vals.append(float('inf'))# 找到最接近决策边界的z值idx = np.argmin(np.abs(decision_vals))z_vals.append(z_test[idx])grid[:, 2] = np.array(z_vals)# 重塑网格z = grid[:, 2].reshape(xx.shape)# 添加决策曲面fig.add_trace(go.Surface(x=xx, y=yy, z=z,colorscale='RdBu',opacity=0.7,showscale=False,name='决策曲面'))except Exception as e:print(f"3D决策曲面创建失败: {e}")# 添加数据点for class_val in np.unique(y):mask = y == class_valname = "负类" if class_val == -1 else "正类"color = 'red' if class_val == -1 else 'blue'fig.add_trace(go.Scatter3d(x=X_3d[mask, 0],y=X_3d[mask, 1],z=X_3d[mask, 2],mode='markers',marker=dict(size=5,color=color,opacity=0.8),name=name,text=[f'样本 {i}, 类别: {"负类" if label == -1 else "正类"}' for i, label in enumerate(y[mask])],hovertemplate='%{text}<br>x: %{x:.2f}<br>y: %{y:.2f}<br>z: %{z:.2f}<extra></extra>'))# 如果提供了模型,添加支持向量if model is not None and hasattr(model, 'support_vectors_'):try:# 将支持向量映射到降维空间if method == 'pca':sv_3d = pca.transform(model.support_vectors_)# 如果维度不足3,填充零向量if sv_3d.shape[1] < 3:pad = np.zeros((sv_3d.shape[0], 3 - sv_3d.shape[1]))sv_3d = np.hstack((sv_3d, pad))else:# t-SNE不支持transform,简单方案是寻找最接近支持向量的训练样本sv_3d = np.zeros((len(model.support_vectors_), 3))for i, sv in enumerate(model.support_vectors_):# 找到最近的原始样本distances = np.sum((X - sv) ** 2, axis=1)nearest_idx = np.argmin(distances)sv_3d[i] = X_3d[nearest_idx]# 添加支持向量fig.add_trace(go.Scatter3d(x=sv_3d[:, 0],y=sv_3d[:, 1],z=sv_3d[:, 2],mode='markers',marker=dict(size=8,color='green',symbol='circle',line=dict(color='green', width=2),opacity=0.9),name='支持向量'))except Exception as e:print(f"添加支持向量时出错: {e}")# 更新布局fig.update_layout(title=title,scene=dict(xaxis_title=axis_labels[0],yaxis_title=axis_labels[1],zaxis_title=axis_labels[2]),width=900,height=700,margin=dict(l=0, r=0, b=0, t=40))return figdef create_animated_decision_boundary(X, y, models, model_names, steps=50):"""创建动画展示不同核函数的决策边界"""try:# 确保使用前两个特征X_2d = X[:, :2] if X.shape[1] > 2 else X# 创建网格x_min, x_max = X_2d[:, 0].min() - 1, X_2d[:, 0].max() + 1y_min, y_max = X_2d[:, 1].min() - 1, X_2d[:, 1].max() + 1xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),np.linspace(y_min, y_max, 100))# 对于高维数据,创建一个满足维度的网格if X.shape[1] > 2:grid = np.zeros((xx.size, X.shape[1]))grid[:, 0] = xx.ravel()grid[:, 1] = yy.ravel()# 使用平均值填充其余维度for i in range(2, X.shape[1]):grid[:, i] = X[:, i].mean()else:grid = np.c_[xx.ravel(), yy.ravel()]# 计算每个模型的决策函数Z_values = []Z_pred_values = []valid_models = []valid_model_names = []for i, (model, name) in enumerate(zip(models, model_names)):try:Z = model.decision_function(grid).reshape(xx.shape)Z_pred = model.predict(grid).reshape(xx.shape)Z_values.append(Z)Z_pred_values.append(Z_pred)valid_models.append(model)valid_model_names.append(name)except Exception as e:print(f"模型 {name} 无法计算决策边界: {e}")# 如果没有有效模型,返回Noneif not valid_models:print("没有可用的模型来创建动画")return None# 创建动画帧frames = []for step in range(steps):# 计算插值权重weights = [np.sin(np.pi * (step / steps + i / len(valid_models))) ** 2 for i in range(len(valid_models))]weights = np.array(weights) / sum(weights)  # 归一化权重# 混合决策函数Z_mix = np.zeros_like(Z_values[0])for i, Z in enumerate(Z_values):Z_mix += weights[i] * Z# 预测结果基于最高权重max_weight_idx = np.argmax(weights)Z_pred_mix = Z_pred_values[max_weight_idx]# 创建帧frame = go.Frame(data=[# 数据点go.Scatter(x=X_2d[:, 0],y=X_2d[:, 1],mode='markers',marker=dict(size=8,color=['red' if label == -1 else 'blue' for label in y],line=dict(width=1, color='black')),showlegend=False,),# 决策函数热图go.Contour(z=Z_mix,x=np.linspace(x_min, x_max, 100),y=np.linspace(y_min, y_max, 100),colorscale='RdBu',showscale=False,contours=dict(start=-2,end=2,size=0.5,showlabels=False),line=dict(width=1),opacity=0.8),# 决策边界线go.Contour(z=Z_mix,x=np.linspace(x_min, x_max, 100),y=np.linspace(y_min, y_max, 100),colorscale=[[0, 'black'], [1, 'black']],showscale=False,contours=dict(start=0,end=0,size=1,showlabels=False),line=dict(width=2),opacity=1)],name=f"frame{step}")frames.append(frame)# 创建基础图形fig = go.Figure(data=[# 数据点go.Scatter(x=X_2d[:, 0],y=X_2d[:, 1],mode='markers',marker=dict(size=8,color=['red' if label == -1 else 'blue' for label in y],line=dict(width=1, color='black')),name='数据点'),# 初始决策边界go.Contour(z=Z_values[0],x=np.linspace(x_min, x_max, 100),y=np.linspace(y_min, y_max, 100),colorscale='RdBu',showscale=False,contours=dict(start=-2,end=2,size=0.5,showlabels=False),line=dict(width=1),opacity=0.8,name='决策函数'),# 初始决策边界线go.Contour(z=Z_values[0],x=np.linspace(x_min, x_max, 100),y=np.linspace(y_min, y_max, 100),colorscale=[[0, 'black'], [1, 'black']],showscale=False,contours=dict(start=0,end=0,size=1,showlabels=False),line=dict(width=2),opacity=1,name='决策边界')],frames=frames,layout=go.Layout(title="SVM决策边界动画",xaxis=dict(range=[x_min, x_max], title="特征1"),yaxis=dict(range=[y_min, y_max], title="特征2"),updatemenus=[{"type": "buttons","buttons": [{"label": "播放","method": "animate","args": [None, {"frame": {"duration": 100, "redraw": True}}]},{"label": "暂停","method": "animate","args": [[None], {"frame": {"duration": 0, "redraw": True}}]}],"direction": "left","pad": {"r": 10, "t": 10},"x": 0.1,"y": 0,"xanchor": "right","yanchor": "top"}],sliders=[{"steps": [{"args": [[f"frame{k}"],{"frame": {"duration": 100, "redraw": True}}],"label": str(valid_model_names[i % len(valid_model_names)]),"method": "animate"}for k, i in zip(range(0, steps, steps // len(valid_model_names)), range(len(valid_model_names)))],"x": 0.1,"y": 0,"currentvalue": {"font": {"size": 12},"prefix": "模型: ","visible": True,"xanchor": "center"},"len": 0.9,"pad": {"b": 10, "t": 50},"transition": {"duration": 300}}]))return figexcept Exception as e:print(f"创建动画时出错: {e}")return Nonedef visualize_metrics_over_C_gamma(X_train, y_train, X_test, y_test, kernel='rbf'):"""可视化C和gamma参数对模型指标的影响参数:X_train, y_train: 训练数据X_test, y_test: 测试数据kernel: 核函数类型"""try:# C参数网格C_range = np.logspace(-3, 3, 7)# gamma参数网格(仅用于非线性核)if kernel != 'linear':gamma_range = np.logspace(-3, 2, 6)else:gamma_range = [0.01]  # 线性核不需要gamma,但为了代码一致性,设置一个默认值# 记录不同参数的性能指标results = []# 训练和评估不同参数组合的模型for C in C_range:for gamma in gamma_range:try:if kernel == 'linear':model = SVC(kernel=kernel, C=C, probability=True)else:model = SVC(kernel=kernel, C=C, gamma=gamma, probability=True)# 训练模型model.fit(X_train, y_train)# 在测试集上评估y_pred = model.predict(X_test)accuracy, precision, recall, f1 = calculate_metrics(y_test, y_pred)# 记录结果results.append({'C': C,'gamma': gamma,'accuracy': accuracy,'precision': precision,'recall': recall,'f1': f1})except Exception as e:print(f"训练参数 C={C}, gamma={gamma} 失败: {e}")# 添加一个无效结果results.append({'C': C,'gamma': gamma,'accuracy': 0,'precision': 0,'recall': 0,'f1': 0})# 转换为DataFrameresults_df = pd.DataFrame(results)# 创建图形if kernel != 'linear' and len(gamma_range) > 1:# 3D曲面图:C,gamma vs 准确率fig = plt.figure(figsize=(18, 10))metrics = ['accuracy', 'precision', 'recall', 'f1']titles = ['准确率', '精确率', '召回率', 'F1分数']for i, (metric, title) in enumerate(zip(metrics, titles)):ax = fig.add_subplot(2, 2, i + 1, projection='3d')try:# 重塑数据以适应3D曲面图pivoted = results_df.pivot_table(values=metric,index='C',columns='gamma')X, Y = np.meshgrid(np.log10(gamma_range), np.log10(C_range))Z = pivoted.values# 绘制曲面surf = ax.plot_surface(X, Y, Z, cmap='viridis',linewidth=0, antialiased=True, alpha=0.8)# 添加标题和标签ax.set_title(f'{kernel}核函数: {title} vs C,gamma')ax.set_xlabel('log10(gamma)')ax.set_ylabel('log10(C)')ax.set_zlabel(title)# 添加颜色条fig.colorbar(surf, ax=ax, shrink=0.5, aspect=5)except Exception as e:print(f"绘制3D曲面图失败: {e}")ax.text2D(0.5, 0.5, "绘图失败",ha='center', transform=ax.transAxes,bbox=dict(facecolor='red', alpha=0.1))plt.tight_layout()plt.show()else:# 对于线性核,仅展示C的影响plt.figure(figsize=(15, 5))metrics = ['accuracy', 'precision', 'recall', 'f1']titles = ['准确率', '精确率', '召回率', 'F1分数']for i, (metric, title) in enumerate(zip(metrics, titles)):plt.subplot(1, 4, i + 1)plt.semilogx(results_df['C'], results_df[metric], marker='o', linewidth=2)plt.title(f'线性核: {title} vs C')plt.xlabel('C值 (log scale)')plt.ylabel(title)plt.grid(True)plt.tight_layout()plt.show()return results_dfexcept Exception as e:print(f"参数可视化失败: {e}")return pd.DataFrame()def plot_learning_curve(X, y, model_type='svm', kernels=['linear', 'rbf', 'poly'],train_sizes=np.linspace(0.1, 1.0, 10)):"""绘制学习曲线,显示训练集大小对模型性能的影响参数:X, y: 数据和标签model_type: 模型类型kernels: 核函数列表train_sizes: 训练集比例"""try:plt.figure(figsize=(15, 5))for i, kernel in enumerate(kernels):# 每个核函数一个子图plt.subplot(1, len(kernels), i + 1)train_acc = []test_acc = []# 随机打乱数据indices = np.random.permutation(len(X))X_shuffled = X[indices]y_shuffled = y[indices]for size in train_sizes:try:# 划分训练集和测试集train_size = max(10, int(len(X) * size))  # 确保至少有10个样本if train_size >= len(X) - 10:train_size = len(X) - 10  # 确保测试集至少有10个样本X_train, X_test = X_shuffled[:train_size], X_shuffled[train_size:train_size + 10]y_train, y_test = y_shuffled[:train_size], y_shuffled[train_size:train_size + 10]# 如果数据太少或类别不全,跳过if len(np.unique(y_train)) < 2 or len(np.unique(y_test)) < 2:continue# 训练模型if kernel == 'linear':model = SVC(kernel=kernel, C=1.0)elif kernel == 'rbf':model = SVC(kernel=kernel, C=10.0, gamma=0.1)else:  # polymodel = SVC(kernel=kernel, C=1.0, degree=3)model.fit(X_train, y_train)# 评估模型train_acc.append(model.score(X_train, y_train))test_acc.append(model.score(X_test, y_test))except Exception as e:print(f"学习曲线计算失败 (kernel={kernel}, size={size}): {e}")# 绘制学习曲线train_sizes_plt = train_sizes[:len(train_acc)]if len(train_acc) > 0:  # 确保有数据点plt.plot(train_sizes_plt, train_acc, 'o-', label='训练集准确率')plt.plot(train_sizes_plt, test_acc, 's-', label='测试集准确率')else:plt.text(0.5, 0.5, "数据不足以绘制学习曲线",ha='center', va='center', transform=plt.gca().transAxes)plt.title(f'{kernel}核函数的学习曲线')plt.xlabel('训练集比例')plt.ylabel('准确率')plt.grid(True)plt.legend(loc='best')plt.tight_layout()plt.show()except Exception as e:print(f"绘制学习曲线失败: {e}")def create_comprehensive_performance_report(models, X_test, y_test, model_names):"""创建综合性能报告"""try:# 创建子图fig = make_subplots(rows=2, cols=2,subplot_titles=('模型性能对比', '混淆矩阵热图', 'ROC曲线对比', '特征重要性'),specs=[[{"type": "bar"}, {"type": "heatmap"}],[{"type": "scatter"}, {"type": "bar"}]])# 收集所有模型的性能指标results = {}colors = ['blue', 'red', 'green', 'orange', 'purple']# 筛选有效的模型valid_models = []valid_model_names = []for model, name in zip(models, model_names):try:# 测试模型是否可用y_pred = model.predict(X_test)valid_models.append(model)valid_model_names.append(name)except Exception as e:print(f"模型 {name} 不可用: {e}")if not valid_models:print("没有有效的模型可供评估")return None, {}for i, (model, name) in enumerate(zip(valid_models, valid_model_names)):# 预测y_pred = model.predict(X_test)try:y_pred_proba = model.predict_proba(X_test)[:, 1] if hasattr(model, 'predict_proba') else Noneexcept:y_pred_proba = None# 计算指标accuracy, precision, recall, f1 = calculate_metrics(y_test, y_pred)results[name] = {'accuracy': accuracy,'precision': precision,'recall': recall,'f1_score': f1}# 混淆矩阵if i == 0:  # 只为第一个模型添加cm = confusion_matrix(y_test, y_pred)# 归一化混淆矩阵cm_sum = cm.sum(axis=1)cm_norm = np.zeros_like(cm, dtype=float)for j in range(len(cm_sum)):if cm_sum[j] > 0:cm_norm[j] = cm[j] / cm_sum[j]# 添加混淆矩阵热图fig.add_trace(go.Heatmap(z=cm_norm,x=['预测-1', '预测1'],y=['实际-1', '实际1'],colorscale='Blues',showscale=True,text=[[f'{cm[i, j]}<br>({cm_norm[i, j]:.1%})' for j in range(2)] for i in range(2)],hoverinfo='text'),row=1, col=2)# ROC曲线if y_pred_proba is not None:try:fpr, tpr, _ = roc_curve(y_test, y_pred_proba)auc_score = auc(fpr, tpr)results[name]['auc'] = auc_scorefig.add_trace(go.Scatter(x=fpr, y=tpr, mode='lines',name=f'{name} (AUC={auc_score:.3f})',line=dict(color=colors[i % len(colors)])),row=2, col=1)except Exception as e:print(f"计算ROC曲线时出错: {e}")# 添加随机分类器线fig.add_trace(go.Scatter(x=[0, 1], y=[0, 1], mode='lines',name='随机分类器', line=dict(dash='dash', color='black')),row=2, col=1)# 性能指标对比柱状图metrics = ['accuracy', 'precision', 'recall', 'f1_score']metric_names = ['准确率', '精确率', '召回率', 'F1分数']for i, (metric, metric_name) in enumerate(zip(metrics, metric_names)):values = [results[name].get(metric, 0) for name in valid_model_names]fig.add_trace(go.Bar(x=valid_model_names, y=values, name=metric_name,marker_color=colors[i % len(colors)]),row=1, col=1)# 特征重要性(如果有线性模型)has_linear = Falsefor model, name in zip(valid_models, valid_model_names):if hasattr(model, 'coef_') and len(model.coef_) > 0:has_linear = Trueimportance = np.abs(model.coef_[0])fig.add_trace(go.Bar(x=importance,y=[f'特征 {i + 1}' for i in range(len(importance))],orientation='h',name=name),row=2, col=2)break  # 只显示一个线性模型的特征重要性if not has_linear:fig.add_annotation(text="非线性模型<br>无法显示特征重要性",x=0.5, y=0.5,xref="x3", yref="y3",showarrow=False,font=dict(size=14))fig.update_layout(height=800, showlegend=True, title_text="SVM模型综合性能报告")fig.update_xaxes(title_text="模型", row=1, col=1)fig.update_yaxes(title_text="分数", row=1, col=1)fig.update_xaxes(title_text="假正例率", row=2, col=1)fig.update_yaxes(title_text="真正例率", row=2, col=1)fig.update_xaxes(title_text="重要性", row=2, col=2)fig.update_yaxes(title_text="特征", row=2, col=2)return fig, resultsexcept Exception as e:print(f"创建性能报告失败: {e}")return None, {}# ==================== 自动调参功能 ====================
def auto_hyperparameter_tuning(X_train, y_train, cv=5, dataset_type=None):"""SVM自动调参,针对不同数据集类型优化"""try:# 根据数据集类型调整参数网格if dataset_type == 'data1' or dataset_type == 'linear':# 线性数据集偏好线性核param_grid = [{'kernel': ['linear'], 'C': [0.1, 1, 10, 100]},{'kernel': ['rbf'], 'C': [1, 10, 100], 'gamma': [0.1, 1, 'scale']}]print("对线性可分数据集进行调参...")elif dataset_type == 'data2' or dataset_type == 'spiral':# 螺旋数据集偏好RBF和多项式核param_grid = [{'kernel': ['rbf'], 'C': [0.1, 1, 10, 100], 'gamma': [0.01, 0.1, 1, 10]},{'kernel': ['poly'], 'C': [0.1, 1, 10], 'gamma': [0.1, 1], 'degree': [2, 3, 4]}]print("对螺旋数据集进行调参...")else:# 通用参数网格param_grid = [{'kernel': ['linear'], 'C': [0.1, 1, 10, 100]},{'kernel': ['rbf'], 'C': [0.1, 1, 10, 100], 'gamma': [0.001, 0.01, 0.1, 1, 'scale']},]print("对通用数据集进行调参...")# 如果数据集较小,简化参数网格if len(X_train) < 50:print("数据集较小,使用简化调参...")param_grid = [{'kernel': ['linear'], 'C': [1, 10]},{'kernel': ['rbf'], 'C': [1, 10], 'gamma': ['scale']}]cv = min(cv, 3)  # 减少交叉验证折数# 网格搜索grid_search = GridSearchCV(SVC(probability=True),param_grid,cv=cv,scoring='accuracy',n_jobs=-1,verbose=1)print("开始自动调参...")grid_search.fit(X_train, y_train)print(f"最佳参数: {grid_search.best_params_}")print(f"最佳交叉验证分数: {grid_search.best_score_:.4f}")return grid_search.best_estimator_, grid_search.best_params_except Exception as e:print(f"自动调参失败: {e}")# 返回一个默认模型default_model = SVC(kernel='linear', C=1.0, probability=True)default_model.fit(X_train, y_train)return default_model, {'kernel': 'linear', 'C': 1.0}# ==================== 主函数 ====================
def main():"""主函数:展示所有功能"""print('=' * 60)print('基于SVM进行分类预测')print('=' * 60)# 1. 数据加载 - 尝试加载CSV文件print("\n步骤1: 数据加载")# 尝试加载linear.csv和spiral.csvcsv_files = ['linear.csv', 'spiral.csv']  # 可以替换为实际文件路径try:# 先尝试加载linear.csvX, y = load_csv_with_specific_columns(csv_files[0])print("成功加载CSV文件")except Exception as e:print(f"CSV加载失败: {e}")print("使用模拟数据...")X, y = generate_linear_data()  # 生成线性可分数据作为默认# 2. 数据预处理及可视化print("\n步骤2: 数据检查与预处理")# 检查是否有NaN值if np.isnan(X).any() or np.isnan(y).any():print("数据中包含NaN值,进行预处理...")X, y = preprocess_data(X, y)# 基本统计信息print(f"数据维度: X={X.shape}, y={y.shape}")print(f"类别分布 - 类别(-1): {np.sum(y == -1)}, 类别(1): {np.sum(y == 1)}")# 可视化数据plt.figure(figsize=(10, 8))if X.shape[1] >= 2:  # 至少有2个特征才能2D可视化plt.scatter(X[y == -1, 0], X[y == -1, 1],color='red', marker='o', label='类别 -1')plt.scatter(X[y == 1, 0], X[y == 1, 1],color='blue', marker='x', label='类别 1')else:  # 1维特征,用y=0作为第二维plt.scatter(X[y == -1], np.zeros_like(X[y == -1]),color='red', marker='o', label='类别 -1')plt.scatter(X[y == 1], np.zeros_like(X[y == 1]),color='blue', marker='x', label='类别 1')plt.title('数据集可视化', fontsize=14)plt.xlabel('特征 1', fontsize=12)plt.ylabel('特征 2' if X.shape[1] >= 2 else 'Y = 0', fontsize=12)plt.legend()plt.grid(True, linestyle='--', alpha=0.7)plt.tight_layout()plt.show()# 3. 数据预处理print("\n步骤3: 数据标准化")scaler = StandardScaler()X_scaled = scaler.fit_transform(X)X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)print(f"训练集大小: {X_train.shape}")print(f"测试集大小: {X_test.shape}")# 4. 手动SMO算法训练print("\n步骤4: 手动SMO算法训练")if X.shape[1] >= 2:# 使用前两个特征进行线性SVM演示X_demo = X_scaled[:, :2]X_train_demo, X_test_demo, y_train_demo, y_test_demo = train_test_split(X_demo, y, test_size=0.2, random_state=42)try:# 训练线性SVMalpha, b = SMO(X_train_demo, y_train_demo, ker=linear_kernel, C=1.0, max_iter=100)# 计算权重和支持向量sup_idx = alpha > 1e-5if np.sum(sup_idx) > 0:  # 确保有支持向量w = np.sum((alpha[sup_idx] * y_train_demo[sup_idx]).reshape(-1, 1) * X_train_demo[sup_idx], axis=0)print(f'支持向量个数: {np.sum(sup_idx)}')print(f'权重向量 w = [{w[0]:.4f}, {w[1]:.4f}]')print(f'偏置项 b = {b:.4f}')# 绘制手动SMO的决策边界plt.figure(figsize=(10, 8))# 创建网格x_min, x_max = X_train_demo[:, 0].min() - 1, X_train_demo[:, 0].max() + 1y_min, y_max = X_train_demo[:, 1].min() - 1, X_train_demo[:, 1].max() + 1xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),np.arange(y_min, y_max, 0.02))# 计算网格点的预测值Z = np.sign(xx * w[0] + yy * w[1] + b)# 绘制决策边界plt.contourf(xx, yy, Z, alpha=0.3, cmap=ListedColormap(['#FFAAAA', '#AAAAFF']))# 绘制数据点plt.scatter(X_train_demo[y_train_demo == -1, 0], X_train_demo[y_train_demo == -1, 1],color='red', marker='o', label='训练集 - 类别 -1')plt.scatter(X_train_demo[y_train_demo == 1, 0], X_train_demo[y_train_demo == 1, 1],color='blue', marker='x', label='训练集 - 类别 1')# 绘制支持向量plt.scatter(X_train_demo[sup_idx, 0], X_train_demo[sup_idx, 1],s=100, facecolors='none', edgecolors='green', linewidth=2,label='支持向量')# 绘制测试点plt.scatter(X_test_demo[:, 0], X_test_demo[:, 1],marker='s', c=y_test_demo, cmap=ListedColormap(['red', 'blue']),alpha=0.3, s=50, label='测试集')# 绘制超平面plt.plot([x_min, x_max], [(-b - w[0] * x_min) / w[1], (-b - w[0] * x_max) / w[1]],'k-', linewidth=2)# 绘制间隔plt.plot([x_min, x_max], [(-b - w[0] * x_min - 1) / w[1], (-b - w[0] * x_max - 1) / w[1]],'k--', linewidth=1)plt.plot([x_min, x_max], [(-b - w[0] * x_min + 1) / w[1], (-b - w[0] * x_max + 1) / w[1]],'k--', linewidth=1)plt.title('手动SMO算法实现的SVM决策边界', fontsize=14)plt.xlabel('特征 1', fontsize=12)plt.ylabel('特征 2', fontsize=12)plt.legend()plt.grid(True, linestyle='--', alpha=0.3)plt.xlim(x_min, x_max)plt.ylim(y_min, y_max)plt.tight_layout()plt.show()# 预测和评估y_pred_demo = np.sign(X_test_demo @ w.reshape(-1, 1) + b).flatten()accuracy, precision, recall, f1 = calculate_metrics(y_test_demo, y_pred_demo)print(f'手动SMO SVM - 准确率: {accuracy:.4f}, 精确率: {precision:.4f}, 召回率: {recall:.4f}, F1: {f1:.4f}')else:print("SMO算法没有找到支持向量,跳过手动SVM演示")except Exception as e:print(f"手动SMO算法训练失败: {e}")print("跳过手动SVM演示")else:print("特征维度不足,跳过手动SVM演示")# 5. 不同核函数比较print("\n步骤5: 不同核函数比较")# 训练不同核函数的sklearn SVM模型kernels = ['linear', 'rbf', 'poly', 'sigmoid']kernel_names = ['线性核', 'RBF核', '多项式核', 'Sigmoid核']fig, axs = plt.subplots(2, 2, figsize=(18, 14))axs = axs.flatten()models = []names = []for i, (kernel, name) in enumerate(zip(kernels, kernel_names)):try:# 调整参数if kernel == 'linear':model = SVC(kernel=kernel, C=1.0, probability=True)elif kernel == 'rbf':model = SVC(kernel=kernel, C=10.0, gamma=0.1, probability=True)elif kernel == 'poly':model = SVC(kernel=kernel, C=1.0, degree=3, gamma=0.1, probability=True)else:  # sigmoidmodel = SVC(kernel=kernel, C=1.0, gamma=0.1, probability=True)# 训练模型model.fit(X_train, y_train)models.append(model)names.append(name)# 评估模型y_pred = model.predict(X_test)accuracy, precision, recall, f1 = calculate_metrics(y_test, y_pred)print(f'{name} SVM - 准确率: {accuracy:.4f}, 精确率: {precision:.4f}, 召回率: {recall:.4f}, F1: {f1:.4f}')# 绘制决策边界try:plot_decision_boundary_enhanced(X_scaled, y, model,title=f'{name} SVM',ax=axs[i],confidence=True)except Exception as e:print(f"绘制决策边界失败 ({name}): {e}")axs[i].set_title(f"{name} SVM (绘制失败)")axs[i].text(0.5, 0.5, "绘制决策边界失败",ha='center', va='center', transform=axs[i].transAxes,bbox=dict(facecolor='red', alpha=0.1))except Exception as e:print(f"模型训练失败 ({name}): {e}")axs[i].text(0.5, 0.5, f"模型训练失败: {name}",ha='center', va='center', transform=axs[i].transAxes,bbox=dict(facecolor='red', alpha=0.1))plt.tight_layout()plt.show()# 6. 自动调参print("\n步骤6: 自动超参数调优")# 确定数据集类型 - 如果是简单的线性可分数据,使用'data1'类型if X.shape[1] <= 2:  # 如果特征数小于等于2,可能是线性或螺旋数据# 这里简化处理,假设线性数据dataset_type = 'data1'else:# 对于高维数据,使用通用调参dataset_type = Nonebest_model, best_params = auto_hyperparameter_tuning(X_train, y_train, dataset_type=dataset_type)# 展示最佳模型的决策边界if X.shape[1] >= 2:plt.figure(figsize=(10, 8))try:plot_decision_boundary_enhanced(X_scaled, y, best_model,title=f"最佳SVM模型 ({best_model.kernel})",confidence=True,show_margin=True)plt.show()except Exception as e:print(f"绘制最佳模型决策边界失败: {e}")# 7. 参数对性能的影响print("\n步骤7: 参数对性能的影响")# 可视化C和gamma参数对性能的影响if X_train.shape[0] > 20 and not np.isnan(X_train).any():  # 数据点足够多且无缺失值才展示results_df = visualize_metrics_over_C_gamma(X_train, y_train, X_test, y_test, kernel=best_model.kernel)else:print("数据量不足或有缺失值,跳过参数影响可视化")# 8. 学习曲线print("\n步骤8: 学习曲线分析")# 绘制学习曲线 - 数据足够多时才展示if X.shape[0] > 50 and not np.isnan(X).any():plot_learning_curve(X_scaled, y, kernels=['linear', 'rbf'])else:print("数据量不足或有缺失值,跳过学习曲线分析")# 9. 3D可视化print("\n步骤9: 生成3D可视化")# 检查数据是否适合3D可视化if not np.isnan(X_scaled).any() and len(X) > 10:try:# 创建3D可视化fig_3d = create_3d_visualization_advanced(X_scaled, y,method='pca',model=best_model,title_suffix="(PCA降维)")fig_3d.show()except Exception as e:print(f"3D可视化创建失败: {e}")else:print("数据不适合3D可视化,跳过此步骤")# 10. 动画可视化print("\n步骤10: 创建决策边界动画")# 检查数据是否适合创建动画if not np.isnan(X_scaled).any() and len(X) > 10 and X.shape[1] >= 2:try:# 过滤有效的模型valid_models = []valid_names = []for model, name in zip(models, names):try:# 测试模型是否可用model.predict(X_test[:1])valid_models.append(model)valid_names.append(name)except:continueif valid_models:# 添加最佳模型if best_model not in valid_models:valid_models.append(best_model)valid_names.append('最佳模型')# 创建动画anim_fig = create_animated_decision_boundary(X_scaled, y, valid_models, valid_names)if anim_fig:anim_fig.show()else:print("没有有效的模型可创建动画")except Exception as e:print(f"动画创建失败: {e}")else:print("数据不适合创建动画,跳过此步骤")# 11. 综合性能报告print("\n步骤11: 生成综合性能报告")# 汇总所有模型all_models = models.copy()all_names = names.copy()# 添加最佳模型if best_model not in all_models:all_models.append(best_model)all_names.append('最佳模型')# 生成性能报告performance_fig, performance_results = create_comprehensive_performance_report(all_models, X_test, y_test, all_names)if performance_fig:performance_fig.show()# 12. 总结print("\n=== 所有演示完成 ===")print(f"最佳模型参数: {best_params}")print("数据处理、模型训练、可视化和性能评估已全部完成!")if __name__ == "__main__":try:main()except Exception as e:print(f"程序运行出错: {e}")print("使用简化版本演示基本功能...")# 简化版本演示print("\n简化版本演示:")try:# 尝试加载CSVX, y = load_csv_with_specific_columns('linear.csv')# 处理缺失值if np.isnan(X).any() or np.isnan(y).any():imputer = SimpleImputer(strategy='mean')X = imputer.fit_transform(X)# 如果y中有缺失值,移除这些样本valid_indices = ~np.isnan(y)if not all(valid_indices):X = X[valid_indices]y = y[valid_indices]except:X, y = generate_linear_data()# 简单的2D可视化plt.figure(figsize=(8, 6))if X.shape[1] >= 2:colors = ['red' if label == -1 else 'blue' for label in y]plt.scatter(X[:, 0], X[:, 1], c=colors, alpha=0.7)else:plt.scatter(X[:, 0], np.zeros_like(X[:, 0]),c=['red' if label == -1 else 'blue' for label in y],alpha=0.7)plt.title("数据可视化")plt.xlabel("特征1")plt.ylabel("特征2" if X.shape[1] >= 2 else "")plt.legend(['负类(-1)', '正类(1)'])plt.grid(True, linestyle='--', alpha=0.5)plt.show()# 简单的SVM模型X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)try:model = SVC(kernel='linear')model.fit(X_train, y_train)print(f"模型准确率: {model.score(X_test, y_test):.4f}")except Exception as e:print(f"模型训练失败: {e}")
四、我的收获

支持向量机是一种强大而优雅的算法,它将优化理论、凸分析和核方法等高级数学概念与实用的分类器结合起来。通过这次实验,我不仅掌握了支持向量机SVM的理论和实现,更重要的是建立了理论与实践的连接,培养了分析问题和实现复杂系统的能力。特别是对标签列的特殊处理需求,让我意识到在实际应用中,算法往往需要根据具体业务需求进行定制和调整。因此,在本节的实验中,我的收获有:

(一)理论与实践的结合

支持向量机的理论在课本上看起来十分抽象,特别是涉及到拉格朗日乘子、对偶问题和KKT条件等数学概念时。然而,通过亲手实现SMO算法,我真正理解了这些理论的实际含义:

1.最大间隔的直观感受:通过可视化决策边界,我直观地看到了SVM如何在保证分类正确的前提下最大化间隔,这使得抽象的优化目标变得具体可感。

2. 对偶问题的意义:以前只知道SVM求解时会转化为对偶问题,但不理解为什么。通过编码实现,我发现对偶形式不仅计算效率更高,而且为核技巧的应用提供了可能性。

3. 支持向量的作用:观察到大部分训练点的拉格朗日乘子为零,只有少数支持向量真正影响决策边界,这极大地提高了模型的泛化能力和计算效率。

(二)核函数的选择与影响

实验中尝试了不同的核函数(线性、RBF、多项式、Sigmoid),对比它们在各类数据集上的表现:

1. 线性核:在线性可分数据上表现优秀,模型简单且计算速度快,但在复杂数据上无法找到有效的决策边界。

2. RBF核:适应性最强,能处理各种复杂模式,但调参难度大。特别是γ参数对模型影响显著 - 过小会导致欠拟合,过大则容易过拟合。

3. 多项式核:在某些特定问题上表现出色,但计算开销大且数值稳定性较差。度数参数需要谨慎选择。

4. Sigmoid核:虽然理论上很有趣,但在实际应用中往往不如其他核函数,参数调整也更为困难。

通过3D可视化和动画,我清晰地看到不同核函数如何在特征空间中构建决策边界,这大大加深了我对核方法本质的理解。

(三)数据处理的重要性

本项目特别关注CSV数据处理,尤其是标签列的特殊处理,这让我认识到数据预处理对机器学习模型的重要性:

1. 缺失值处理:对特征列使用均值填充是常见做法,但对标签列则需要更谨慎的处理策略。

2. 字符串到数值的映射:设计合理的映射函数,既保留原始数据语义又满足算法需求,这是实际应用中的关键挑战。

3. 标准化的必要性:未经标准化的数据可能导致某些特征主导模型决策,从实验中可以明显看到标准化对SVM性能的显著影响。

(四)可视化的价值

交互式可视化不仅美观,更是理解和调试模型的强大工具:

  1. 决策边界可视化:通过可视化决策边界和支持向量,我能够直观地判断模型是否过拟合或欠拟合。

2. 参数影响分析:3D图表展示了C和γ参数对模型性能的影响,帮助我更有针对性地调整参数。

3. 降维技术的应用:使用PCA和t-SNE进行3D可视化,让我理解了高维数据的结构以及模型在实际空间中的行为方式。

4. 动画效果:动态展示不同核函数的决策边界变化,这种动态视角比静态图表能提供更多信息。

(五)写Python代码的收获

从编码角度,这个项目也带给我很多收获:

1. 模块化设计:将复杂系统拆分为数据访问、算法实现、可视化和自动调参等模块,大大提高了代码的可读性和可维护性。

2. 错误处理:在实际数据处理中,异常情况远比预想的多,全面的错误处理和降级策略确保了系统的稳定运行。

3. 算法效率:通过实现SMO算法,我体会到了算法优化的重要性,特别是启发式选择变量和矩阵预计算等技巧。

4. 交互性设计:设计交互式界面比简单的数据处理要复杂得多,但带来的用户体验提升也是显著的。

(六)未来改进方向

1. 增加更多核函数:实现更多特殊核函数,如Chi-Square核、波形核等,探索它们在特定问题上的表现。

2. 优化SMO算法:当前实现的是简化版SMO,未来可以加入完整的启发式变量选择策略,进一步提高收敛速度。

3. 扩展到多分类:使用one-vs-one或one-vs-all策略将SVM扩展到多分类问题。

4. 集成学习:将SVM作为基学习器,探索集成方法如SVM-Bagging或多核融合的可能性。

5. 在线学习:探索增量SVM算法,使模型能够处理流式数据。

五、我的感受

       支持向量机虽然在近年来被深度学习的热潮所掩盖,但它依然是机器学习领域的基石,在许多场景中有着不可替代的价值。这次实验不仅加深了我对机器学习的理解,也培养了我解决实际问题的能力,更加深了我对人工智能算法的兴趣,是一次非常有价值的学习经历。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.pswp.cn/news/908579.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

前端开发面试题总结-JavaScript篇(二)

文章目录 其他高频问题15、JS的数据类型有哪些16、如何判断数组类型&#xff1f;17、解释 this 的指向规则18、跨域问题及解决方案19、宏任务与微任务的区别是什么&#xff1f;列举常见的宏任务和微任务。20、为什么微任务的优先级高于宏任务&#xff1f;设计目的是什么&#x…

硬件电路设计-开关电源设计

硬件电路设计-开关电源 电容选取设置输出电压电感的选取PCB布局典型电路 这里以杰华特的JW5359M 开关电源为例&#xff0c;介绍各个部分的功能电路。 当EN引脚电压低于0.4V时&#xff0c;整个稳压器关闭&#xff0c;稳压器消耗的电源电流降至1μΑ以下 电容选取 1.C1和C25构成…

phosphobot开源程序是控制您的 SO-100 和 SO-101 机器人并训练 VLA AI 机器人开源模型

​一、软件介绍 文末提供程序和源码下载 phosphobot开源程序是控制您的 SO-100 和 SO-101 机器人并训练 VLA AI 机器人开源模型。 二、Overview 概述 &#x1f579;️ Control your robot with the keyboard, a leader arm, a Meta Quest headset or via API &#x1f579;️…

数据通信基础

信道特性 1.信道带宽W • 模拟信道&#xff1a;Wf2-f1&#xff08;f2和f1分别表示&#xff1a;信道能通过的最高/最低频率&#xff0c;单位赫兹Hz&#xff09;。 • 数字信道&#xff1a;数字信道是离散信道&#xff0c;带宽为信道能够达到的最大数据传输速率&#xff0c;单位…

C++与Python编程体验的多维对比:从语法哲学到工程实践

引言&#xff1a;语言定位的本质差异 作为静态编译型语言的代表&#xff0c;C以0开销抽象原则著称&#xff0c;其模板元编程能力可达图灵完备级别&#xff0c;而Python作为动态解释型语言&#xff0c;凭借鸭子类型和丰富的标准库成为快速开发的首选。这种根本差异导致两种语言…

TP6 实现一个字段对数组中的多个值进行LIKE模糊查询(OR逻辑)

在ThinkPHP6中&#xff0c;可以通过以下方式实现一个字段对数组中的多个值进行LIKE模糊查询&#xff08;OR逻辑&#xff09;&#xff1a; 1&#xff0c;使用数组形式的where条件&#xff0c;通过第三个参数指定逻辑关系&#xff1a; $where[] [字段名, like, [%值1%, %值2%]…

接口不是json的内容能用Jsonpath获取吗,如果不能,我们选用什么方法处理呢?

JsonPath 是一种专门用于查询和提取 JSON 数据的查询语言&#xff08;类似 XPath 用于 XML&#xff09;。以下是详细解答&#xff1a; ​JsonPath 的应用场景​ ​API 响应处理​&#xff1a;从 REST API 返回的 JSON 数据中提取特定字段。​配置文件解析​&#xff1a;读取 J…

TCP/IP 与高速网络

题目用 “与” 而不是 “是” 连接两名词&#xff0c;说明它们天然互斥&#xff0c;就比如看到 “经理与人” &#xff0c;自然而然的就会觉得经理接近了神。 数据在 TCP/IP 网络上传输获得的 “尽力而为” 承诺的时间在端到端时延中占比太大&#xff0c;以至于针对 TCP/IP 的…

Vue3 (数组push数据报错) 解决Cannot read property ‘push‘ of null报错问题

解决Cannot read property ‘push‘ of null报错问题 错误写法 定义变量 <script setup>const workList ref([{name:,value:}])</script>正确定义变量 <script setup>const workList ref([]) </script>解决咯~

React前端框架

React&#xff1a;构建现代用户界面的范式革命&#xff08;深度解析&#xff09; 引言&#xff1a;前端开发的范式转变 在2013年之前&#xff0c;前端开发领域被jQuery等库主导&#xff0c;开发者通过命令式编程直接操作DOM元素。这种模式存在两大痛点&#xff1a;代码可维护…

Redis:string数据类型

&#x1f308; 个人主页&#xff1a;Zfox_ &#x1f525; 系列专栏&#xff1a;Redis &#x1f525; String字符串 &#x1f9d1;‍&#x1f4bb; 字符串类型是Redis最基础的数据类型&#xff0c;关于字符串需要特别注意&#xff1a; ⾸先Redis中所有的键的类型都是字符串类…

获取 OpenAI API Key

你可以按照以下步骤来获取 openai.api_key&#xff0c;用于调用 OpenAI 的 GPT-4、DALLE、Whisper 等 API 服务&#xff1a; &#x1f9ed; 获取 OpenAI API Key 的步骤&#xff1a; ✅ 1. 注册或登录 OpenAI 账号 打开 https://platform.openai.com/ 使用你的邮箱或 Google/…

window安装docker\docker-compose

安装前配置 打开控制面板,参照下图打开“启动或关闭windows功能”,Hyper-V 和容器需要启用 程序和功能 启动或关闭windows功能 勾选Hyper-V 安装路径配置 Docker在Windows上的默认安装路径为C:\Program Files\Docker。 以管理员身份运行CMD在D盘,dev文件夹下创建Docker文…

Xxl-job——源码设计思考

摘要 本文深入探讨了XXL-Job框架的设计思考&#xff0c;分析了其不使用Lombok的Data注解的原因&#xff0c;包括明确控制代码结构、避免依赖侵入、增强可维护性和调试便利性、保持编译清晰以及遵循项目历史和团队编码规范。文章还详细介绍了XXL-Job的优化设计&#xff0c;包括…

九、【ESP32开发全栈指南: UDP通信服务端】

一、TCP与UDP核心差异 特性TCPUDP连接方式面向连接 (需三次握手)无连接可靠性可靠传输 (重传/排序/校验)尽力交付 (不保证可靠性)实时性延迟较高低延迟&#xff0c;实时性强传输效率协议开销大头部开销小 (仅8字节)连接类型点对点支持广播/多播资源占用高 (需维护连接状态)极低…

`mermaid-cli` 生成高分辨率 Mermaid 流程图(可以下载安装Typora更好 )的操作指南

以下是使用 mermaid-cli 生成高分辨率 Mermaid 流程图&#xff08;可以下载安装Typora更好 &#xff09;的操作指南 一、安装依赖&#xff08;需管理员权限&#xff09; 安装 Node.js v16 官网下载&#xff1a;Node.js 官方下载 验证安装成功&#xff1a; node -v # 应显…

LlamaFactory × 多模态RAG × Chat-BI:万字长文探寻RAG进化轨迹,打造卓越专业AI助手

你有没有想过&#xff0c;大模型如何更聪明地回答问题&#xff1f;&#x1f914; 当传统 RAG 遇上多模态与商业智能&#xff08;BI&#xff09;&#xff0c;会碰撞出怎样的火花&#xff1f;&#x1f914; 今天我们将围绕医学这个专业领域&#xff0c;一步步搭建出一个集众多本…

python打卡day47

特征图与注意力热图 知识点回顾&#xff1a; 不同CNN层的特征图&#xff1a;不同通道的特征图通道注意力后的特征图和热力图 特征图本质就是不同的卷积核的输出&#xff0c;浅层指的是离输入图近的卷积层&#xff0c;浅层卷积层的特征图通常较大&#xff0c;而深层特征图会经…

缓存一致性 与 执行流

上接多执行流系统中的可见性 在缓存一致性协议描述中&#xff0c;使用“处理器”或“CPU核心”比“执行流”更精确吗? 核心结论&#xff1a;在缓存一致性协议描述中&#xff0c;使用“处理器”或“CPU核心”比“执行流”更精确&#xff01; 你的直觉是正确的。 原因分析&am…

机器学习:load_predict_project

本文目录&#xff1a; 一、project目录二、utils里的两个工具包&#xff08;一&#xff09;common.py&#xff08;二&#xff09;log.py 三、src文件夹代码&#xff08;一&#xff09;模型训练&#xff08;train.py&#xff09;&#xff08;二&#xff09;模型预测&#xff08;…