背景
最近查看日志发现某服务偶现Caused by: java.net.UnknownHostException
同时查看eureka的access.log 出现如下异常
10.xxx.xxx.xxx - - [27/May/2025:23:57:29 +0800] “PUT /eureka/apps/{appName}/{host}:xxx-job:8082?status=UP&lastDirtyTimestamp=1748351637173 HTTP/1.1” 404 -
问题关联分析
1. 问题链条
Eureka心跳续约失败(404) → 服务从注册中心下线 → 服务发现失败 → DNS解析失败 → UnknownHostException
2. 具体流程
xxx-job
服务向Eureka发送心跳续约请求- Eureka Server返回404,表示该服务实例不存在
- 经过几次失败后,Eureka Server将该服务实例从注册表中移除
- 其他服务调用
xxx-job
时,从Eureka获取不到该服务的实例信息 - Spring Cloud LoadBalancer无法解析
xxx-job
服务名 - 最终抛出
UnknownHostException: xxx-job
为什么会出现404错误
1. 服务注册不完整
# 可能的配置问题
eureka:client:initial-instance-info-replication-interval-seconds: 40 # 初始注册延迟太长instance-info-replication-interval-seconds: 30
2. Eureka Server清理策略
# Eureka Server可能过于激进地清理实例
eureka:server:eviction-interval-timer-in-ms: 60000 # 清理间隔enable-self-preservation: false # 自我保护模式关闭
3. 网络问题导致注册失败
实例ID中的 host
可能存在DNS解析问题。
解决方案
1. 修复服务注册配置
eureka:client:service-url:defaultZone: http://eureka-server:8761/eureka/register-with-eureka: truefetch-registry: trueinitial-instance-info-replication-interval-seconds: 5instance-info-replication-interval-seconds: 10instance:prefer-ip-address: trueip-address: xxx.xxx.xxx.xxxinstance-id: ${spring.application.name}:${spring.cloud.client.ip-address}:${server.port}lease-renewal-interval-in-seconds: 10lease-expiration-duration-in-seconds: 30
2. 启用Eureka Server自我保护模式
eureka:server:enable-self-preservation: truerenewal-percent-threshold: 0.85eviction-interval-timer-in-ms: 120000
3. 添加重试机制
# 为Feign客户端添加重试
feign:client:config:default:retryer: feign.Retryer.Default
4. 监控和诊断
@Component
public class ServiceHealthMonitor {@Autowiredprivate EurekaClient eurekaClient;@Scheduled(fixedRate = 30000)public void checkServiceHealth() {Application app = eurekaClient.getApplication("xxx-job");if (app == null || app.getInstances().isEmpty()) {logger.warn("xxx-job服务不可用");}}
}
5. 应急预案
@Component
public class ServiceFallback {@Retryable(value = UnknownHostException.class, maxAttempts = 3)public String callXxxjob() {// 服务调用逻辑}@Recoverpublic String recover(UnknownHostException ex) {logger.error("服务调用失败,启用降级策略", ex);return "服务暂时不可用";}
}
验证步骤
- 检查Eureka Dashboard:确认
xxx-job
服务是否持续在线 - 监控日志:观察404错误的频率和时间模式
- 测试网络:
ping host
- 检查服务启动顺序:确保Eureka Server先启动
这个问题的核心是服务注册中心的状态不一致,建议重点解决服务注册稳定性问题。