Debugging crashes/hangs on Linux
We do your best to make the server as stable and reliable as possible. None the less, there might be conditions that never happen at our test, be it different timings, printer results or configurations, that cause the server to crash. Just to be clear about this, I mean not just a print stopping to talk or a reconnect in your web browser. I mean the backend server instances crashes. If that happens you will see in /var/log/syslog something like
Jul 13 23:14:59 Repetier-Server kernel: [85421.242500] Alignment trap: not handling instruction e19c2f9f at [<76db67d4>] Jul 13 23:14:59 Repetier-Server kernel: [85421.242515] Unhandled fault: alignment exception (0x001) at 0x72740061 Jul 13 23:14:59 Repetier-Server kernel: [85421.242529] pgd = b1820000 Jul 13 23:14:59 Repetier-Server kernel: [85421.242545]  *pgd=318c0835, *pte=3abf679f, *ppte=3abf6e7f Jul 13 23:14:59 Repetier-Server systemd: RepetierServer.service: Main process exited, code=killed, status=6/ABRT Jul 13 23:14:59 Repetier-Server systemd: RepetierServer.service: Unit entered failed state. Jul 13 23:14:59 Repetier-Server systemd: RepetierServer.service: Failed with result 'signal'. Jul 13 23:14:59 Repetier-Server systemd: RepetierServer.service: Service has no hold-off time, scheduling restart. Jul 13 23:14:59 Repetier-Server systemd: Stopped Repetier-Server 3D Printer Server. Jul 13 23:14:59 Repetier-Server systemd: Starting Repetier-Server 3D Printer Server... Jul 13 23:15:00 Repetier-Server systemd: Started Repetier-Server 3D Printer Server.
This can cause a print abort or a flash in the web browser. As you see we start the server after a crash. So without testing syslog you will never know if it is a crash or connection problem.
Now to help us solve the problem with the next release, we need to find the location where the crash happened. To find this the server needs to run inside the debugger. So here we define the steps and which information we need to find the source. All this is done in a linux console. If you use a Raspberry Pi or similar you might do this over a ssh connection using e.g. putty. On regular linux versions you can simply open the console application.
At first you need to have the gnu debugger gdb installed. On Debian systems you install this easily with
sudo apt-get update sudo apt-get install gdb
Once it is installed we will start gdb and connect to the existing Repetier-Server. To do so we need the PID of the server. We get this like this:
root@FriendlyELEC:/var/lib/Repetier-Server/configs# ps aux | grep tier repetie+ 928 0.6 2.9 260784 29236 ? Ssl Jul13 9:28 /usr/local/Repetier-Server/bin/RepetierServer -c /usr/local/Repetier-Server/etc/RepetierServer.xml --daemon pi 23206 0.0 0.0 1376 376 pts/1 S+ 12:37 0:00 tail -f /var/lib/Repetier-Server/logs/server.log root 27250 0.0 0.0 2064 508 pts/2 S+ 12:53 0:00 grep --color=auto tier
Look at the line containing /usr/local/Repetier-Server/bin/RepetierServer and remember the first number, here 928 which is the PID. Now run gdb
gdb attach 928
At this moment the process is halted and you can analyse it. Now it depends what problem you like to debug. If you want to debug a crash that might happen later you simply continue by sending “c” and once the server crashes it will stop in debugger. It is important to keep the console open. If it closes it might pause the server or even stop it. With ssh this means the opening OS (windows) should not go into sleep mode. If the server is running but unresponsive, so you think it is hanging for some reason you directly continue with the analysis:
First step is to check the active thread by sending bt:
(gdb) bt #0 __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47 #1 0xf749f4ae in do_sigwait (set=, set@entry=0xffe3ef04, sig=sig@entry=0xffe3ef84) at ../sysdeps/unix/sysv/linux/sigwait.c:61 #2 0xf749f50a in __sigwait (set=0xffe3ef04, sig=0xffe3ef84) at ../sysdeps/unix/sysv/linux/sigwait.c:96 #3 0x0050e0a0 in Poco::Util::ServerApplication::waitForTerminationRequest() () #4 0x002fc080 in repetier::RepetierServerApplication::main(std::vector<std::string, std::allocator > const&) () #5 0x00501886 in Poco::Util::Application::run() () #6 0x0050e3e0 in Poco::Util::ServerApplication::run(int, char**) () #7 0x002fabc6 in main ()
In a crash case this will show where it crashed and which functions called which to create this.
Especially in the case of the hang it is also important to get the backtrace of all threads, so you run
(gdb)thread apply all bt
This returns a very long list with infos about all thread. Especially if you did many reloads.
What we need to find and resolve the problem you experience is
- What were you doing when it happened
- Any message you see in gdb why it stopped
- Backlog of active thread
- Backlog of all threads
- If it only happens on special files being processed, these files as well
- Last 100 lines of /var/lib/RepetierServer/logs/server.log
Send all this to firstname.lastname@example.org and we will try to find the cause and fix it for the next release.