当前位置:首页 > 服务端 > 基于 systemd 创建 Linux service 启动顺序和检测故障重启

基于 systemd 创建 Linux service 启动顺序和检测故障重启

背景

团队基于 Armbian 设计了一个 LoRa 网关,它要求上电后开始运行主程序 packet_forwarder (它实现 LoRa<-(转)->UDP 与服务器通信)。
这本来是一个简单的需求,将其设计成一个 service 加载到 systemd 中就可以完成,该 rime_gateway.service 代码如下:

[Unit]
Description=Rime LoRaWAN Gateway

[Service]
WorkingDirectory=/home/rime/packet_forwarder/lora_pkt_fwd
ExecStart=/home/rime/packet_forwarder/lora_pkt_fwd/start_gateway.sh
Restart=always

[Install]
WantedBy=multi-user.target

语法解释请参考 Systemd 入门教程:命令篇

不稳定的服务

当使用 systemctl start rime_gateway.service 手动启动时,它工作得很好。

然而,当 Armbian 上电自启动后,使用 systemctl status rime_gateway.service 查看发现该服务已经停止工作:

rime_gateway.service - Rime LoRaWAN Gateway
   Loaded: loaded (/lib/systemd/system/rime_gateway.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2020-04-20 06:51:46 UTC; 29s ago
  Process: 1112 ExecStart=/home/rime/packet_forwarder/lora_pkt_fwd/start_gateway.sh (code=exited, status=1/FAILURE)
 Main PID: 1112 (code=exited, status=1/FAILURE)

Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Service RestartSec=100ms expired, scheduling restart.
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Scheduled restart job, restart counter is at 5.
Apr 20 06:51:46 orangepizero systemd[1]: Stopped Rime LoRaWAN Gateway.
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Start request repeated too quickly.
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Failed with result 'exit-code'.
Apr 20 06:51:46 orangepizero systemd[1]: Failed to start Rime LoRaWAN Gateway.

上面的语句显示服务重启太快,系统退出重启。

使用 journalctl -u rime_gateway.service 查看日志,系统以 100ms 间隔 5 次重启都失败。

-- Logs begin at Mon 2020-04-20 06:51:31 UTC, end at Mon 2020-04-20 06:55:01 UTC. --
Apr 20 06:51:40 orangepizero systemd[1]: Started Rime LoRaWAN Gateway.
Apr 20 06:51:40 orangepizero start_gateway.sh[572]: Reset start_gateway.sh
Apr 20 06:51:41 orangepizero start_gateway.sh[572]: Starting start_gateway.sh
Apr 20 06:51:41 orangepizero systemd[1]: rime_gateway.service: Main process exited, code=exited, status=1/FAILURE
Apr 20 06:51:41 orangepizero systemd[1]: rime_gateway.service: Failed with result 'exit-code'.
Apr 20 06:51:41 orangepizero systemd[1]: rime_gateway.service: Service RestartSec=100ms expired, scheduling restart.
Apr 20 06:51:41 orangepizero systemd[1]: rime_gateway.service: Scheduled restart job, restart counter is at 1.

。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。

Apr 20 06:51:45 orangepizero start_gateway.sh[1112]: Reset start_gateway.sh
Apr 20 06:51:46 orangepizero start_gateway.sh[1112]: Starting start_gateway.sh
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Main process exited, code=exited, status=1/FAILURE
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Failed with result 'exit-code'.
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Service RestartSec=100ms expired, scheduling restart.
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Scheduled restart job, restart counter is at 5.
Apr 20 06:51:46 orangepizero systemd[1]: Stopped Rime LoRaWAN Gateway.
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Start request repeated too quickly.
Apr 20 06:51:46 orangepizero systemd[1]: rime_gateway.service: Failed with result 'exit-code'.
Apr 20 06:51:46 orangepizero systemd[1]: Failed to start Rime LoRaWAN Gateway.

查看网关日志,发现失败的原因是网络没有建立成功 tail -f /tmp/start_gateway.sh.log

ERROR: [up] connect returned Network is unreachable

修改启动顺序

很明显,该服务依赖于网络的建立,因此,首先添加如下语句

After=network.target

这个启动顺序生效了吗?为此,我们导出并查看了启动顺序

systemd-analyze plot > boot.svg

使用 chrome 浏览器打开 boot.svg 发现:先启动 network.target,后启动 rime_gateway.service

基于 systemd 创建 Linux service 启动顺序和检测故障重启 _ JavaClub全栈架构师技术笔记

更多启动顺序请参考 Linux systemd启动守护进程,service启动顺序分析及调整service启动顺序

检测故障重启

为了让服务更健壮,检测到失败退出时自动重启。为此,添加了如下的代码。

systemd 将尝试永久重启服务

StartLimitIntervalSec=0

每隔 1 秒重启服务是个好主意,以避免在出现问题时对服务器施加太大压力。

RestartSec=1

更多自动重启请参考 使用systemd创建Linux服务

稳定的服务

最终的 rime_gateway.service 代码如下所示

[Unit]
Description=Rime LoRaWAN Gateway
After=network.target
StartLimitIntervalSec=0

[Service]
WorkingDirectory=/home/rime/packet_forwarder/lora_pkt_fwd
ExecStart=/home/rime/packet_forwarder/lora_pkt_fwd/start_gateway.sh
Restart=always
RestartSec=1

[Install]
WantedBy=multi-user.target

小窍门:更改 unit 配置文件后,请使用 systemctl daemon-reload 重新加载。

使用 systemctl status rime_gateway.service 和 journalctl -u rime_gateway.service 查看日志,服务正常启动。

在异常的情况下,先拔出网线,再重启 Armbian,发现 systemd 以每隔 1 秒间隔启动服务,直到网络恢复正常为止(本案例重启 78 次)。

-- Logs begin at Mon 2020-04-20 07:32:09 UTC, end at Mon 2020-04-20 07:35:12 UTC. --
Apr 20 07:32:19 orangepizero systemd[1]: Started Rime LoRaWAN Gateway.
Apr 20 07:32:20 orangepizero start_gateway.sh[839]: Reset start_gateway.sh
Apr 20 07:32:20 orangepizero start_gateway.sh[839]: Starting start_gateway.sh
Apr 20 07:32:20 orangepizero systemd[1]: rime_gateway.service: Main process exited, code=exited, status=1/FAILURE
Apr 20 07:32:20 orangepizero systemd[1]: rime_gateway.service: Failed with result 'exit-code'.
Apr 20 07:32:21 orangepizero systemd[1]: rime_gateway.service: Service RestartSec=1s expired, scheduling restart.
Apr 20 07:32:21 orangepizero systemd[1]: rime_gateway.service: Scheduled restart job, restart counter is at 1.
Apr 20 07:32:21 orangepizero systemd[1]: Stopped Rime LoRaWAN Gateway.
Apr 20 07:32:21 orangepizero systemd[1]: Started Rime LoRaWAN Gateway.
Apr 20 07:32:22 orangepizero start_gateway.sh[991]: Reset start_gateway.sh
Apr 20 07:32:22 orangepizero start_gateway.sh[991]: Starting start_gateway.sh

。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。

Apr 20 07:34:54 orangepizero systemd[1]: rime_gateway.service: Main process exited, code=exited, status=1/FAILURE
Apr 20 07:34:54 orangepizero systemd[1]: rime_gateway.service: Failed with result 'exit-code'.
Apr 20 07:34:55 orangepizero systemd[1]: rime_gateway.service: Service RestartSec=1s expired, scheduling restart.
Apr 20 07:34:55 orangepizero systemd[1]: rime_gateway.service: Scheduled restart job, restart counter is at 78.
Apr 20 07:34:55 orangepizero systemd[1]: Stopped Rime LoRaWAN Gateway.
Apr 20 07:34:55 orangepizero systemd[1]: Started Rime LoRaWAN Gateway.
Apr 20 07:34:55 orangepizero start_gateway.sh[2644]: Reset start_gateway.sh
Apr 20 07:34:56 orangepizero start_gateway.sh[2644]: Starting start_gateway.sh

作者:KevinAshton
来源链接:https://www.cnblogs.com/rimelink/p/12738201.html

版权声明:
1、Java侠(https://www.javaxia.com)以学习交流为目的,由作者投稿、网友推荐和小编整理收藏优秀的IT技术及相关内容,包括但不限于文字、图片、音频、视频、软件、程序等,其均来自互联网,本站不享有版权,版权归原作者所有。

2、本站提供的内容仅用于个人学习、研究或欣赏,以及其他非商业性或非盈利性用途,但同时应遵守著作权法及其他相关法律的规定,不得侵犯相关权利人及本网站的合法权利。
3、本网站内容原作者如不愿意在本网站刊登内容,请及时通知本站(javaclubcn@163.com),我们将第一时间核实后及时予以删除。





本文链接:https://www.javaxia.com/server/125711.html

分享给朋友:

“基于 systemd 创建 Linux service 启动顺序和检测故障重启” 的相关文章