andersfylling/disgord

Do you want to work on this issue?

You can request for a bounty in order to promote it!

Deadlock during reconnect phase #132

ikkerens posted onGitHub

Describe the bug The bot that I have in my administration has recently frozen. I do not know for how long it was in this state.

Desktop (please complete the following information):

  • Golang version: go version go1.12.3 linux/amd64
  • Using Go modules? yes
  • Using Disgord v0.10.3

Additional context Goroutine dump showing the deadlock between goroutine 34 and 34119, 245 minutes long.

goroutine 35375 [running]:
runtime/pprof.writeGoroutineStacks(0xa0efc0, 0xc0002427e0, 0x40d63f, 0xc00053ec30)
    /usr/local/go/src/runtime/pprof/pprof.go:678 +0xa7
runtime/pprof.writeGoroutine(0xa0efc0, 0xc0002427e0, 0x2, 0xc00018a700, 0x0)
    /usr/local/go/src/runtime/pprof/pprof.go:667 +0x44
runtime/pprof.(*Profile).WriteTo(0xdb9180, 0xa0efc0, 0xc0002427e0, 0x2, 0xc0002427e0, 0x988400)
    /usr/local/go/src/runtime/pprof/pprof.go:328 +0x3e4
net/http/pprof.handler.ServeHTTP(0xc0000286d1, 0x9, 0xa13e40, 0xc0002427e0, 0xc0001cec00)
    /usr/local/go/src/net/http/pprof/pprof.go:245 +0x210
net/http/pprof.Index(0xa13e40, 0xc0002427e0, 0xc0001cec00)
    /usr/local/go/src/net/http/pprof/pprof.go:268 +0x723
net/http.HandlerFunc.ServeHTTP(0x9a5028, 0xa13e40, 0xc0002427e0, 0xc0001cec00)
    /usr/local/go/src/net/http/server.go:1964 +0x44
net/http.(*ServeMux).ServeHTTP(0xdc6fc0, 0xa13e40, 0xc0002427e0, 0xc0001cec00)
    /usr/local/go/src/net/http/server.go:2361 +0x127
net/http.serverHandler.ServeHTTP(0xc00013a000, 0xa13e40, 0xc0002427e0, 0xc0001cec00)
    /usr/local/go/src/net/http/server.go:2741 +0xab
net/http.(*conn).serve(0xc00009ed20, 0xa14380, 0xc0000621c0)
    /usr/local/go/src/net/http/server.go:1847 +0x646
created by net/http.(*Server).Serve
    /usr/local/go/src/net/http/server.go:2851 +0x2f5

goroutine 1 [chan receive, 22345 minutes]:
github.com/andersfylling/disgord.(*Client).DisconnectOnInterrupt(0xc0000d02c0, 0x0, 0x0)
    /usr/lib/go/pkg/mod/github.com/andersfylling/disgord@v0.10.3/client.go:467 +0xf0
github.com/andersfylling/disgord.(*Client).StayConnectedUntilInterrupted(0xc0000d02c0, 0x98ba1b, 0x13)
    /usr/lib/go/pkg/mod/github.com/andersfylling/disgord@v0.10.3/client.go:480 +0x108
main.main()
    /var/git/LPBot/main.go:38 +0x380

goroutine 19 [syscall, 22345 minutes]:
os/signal.signal_recv(0x0)
    /usr/local/go/src/runtime/sigqueue.go:139 +0x9c
os/signal.loop()
    /usr/local/go/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
    /usr/local/go/src/os/signal/signal_unix.go:29 +0x41

goroutine 21 [IO wait]:
internal/poll.runtime_pollWait(0x7ffba56a4f00, 0x72, 0x0)
    /usr/local/go/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc000020118, 0x72, 0xc000094000, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc000020118, 0xffffffffffffff00, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Accept(0xc000020100, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:384 +0x1a0
net.(*netFD).accept(0xc000020100, 0x40d63f, 0xc00009ed20, 0xa0)
    /usr/local/go/src/net/fd_unix.go:238 +0x42
net.(*TCPListener).accept(0xc00000e030, 0xc00004fdf0, 0xccf192b, 0x8f1e170a535e6472)
    /usr/local/go/src/net/tcpsock_posix.go:139 +0x2e
net.(*TCPListener).AcceptTCP(0xc00000e030, 0xc00004fe18, 0x4885c6, 0x5cc61d99)
    /usr/local/go/src/net/tcpsock.go:247 +0x47
net/http.tcpKeepAliveListener.Accept(0xc00000e030, 0xc00004fe68, 0x18, 0xc000074d80, 0x684aa5)
    /usr/local/go/src/net/http/server.go:3232 +0x2f
net/http.(*Server).Serve(0xc00013a000, 0xa141c0, 0xc00000e030, 0x0, 0x0)
    /usr/local/go/src/net/http/server.go:2826 +0x22f
net/http.(*Server).ListenAndServe(0xc00013a000, 0xc00013a000, 0x0)
    /usr/local/go/src/net/http/server.go:2764 +0xb6
net/http.ListenAndServe(0x982c6a, 0x5, 0x0, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/http/server.go:3004 +0x74
main.init.0.func1()
    /var/git/LPBot/main.go:19 +0x3e
created by main.init.0
    /var/git/LPBot/main.go:18 +0x35

goroutine 34 [select, 245 minutes]:
github.com/andersfylling/disgord.(*WSShardManager).Prepare.func1(0xc0001160c0, 0xc000066660)
    /usr/lib/go/pkg/mod/github.com/andersfylling/disgord@v0.10.3/shard.go:110 +0xcf
created by github.com/andersfylling/disgord.(*WSShardManager).Prepare
    /usr/lib/go/pkg/mod/github.com/andersfylling/disgord@v0.10.3/shard.go:108 +0x1d8

goroutine 35 [select]:
github.com/andersfylling/disgord.demultiplexer(0xc0000a4370, 0xc000066600, 0xc0000a4320)
    /usr/lib/go/pkg/mod/github.com/andersfylling/disgord@v0.10.3/reactor.go:49 +0xd5
created by github.com/andersfylling/disgord.(*Client).setupConnectEnv
    /usr/lib/go/pkg/mod/github.com/andersfylling/disgord@v0.10.3/client.go:395 +0x202

goroutine 34118 [select]:
github.com/andersfylling/disgord/websocket.(*client).pulsate(0xc000001980, 0xa14380, 0xc000094dc0)
    /usr/lib/go/pkg/mod/github.com/andersfylling/disgord@v0.10.3/websocket/client.go:753 +0x349
github.com/andersfylling/disgord/websocket.(*client).prepareHeartbeating(0xc000001980, 0xa14380, 0xc000094dc0)
    /usr/lib/go/pkg/mod/github.com/andersfylling/disgord@v0.10.3/websocket/client.go:708 +0x16d
created by github.com/andersfylling/disgord/websocket.(*client).connect
    /usr/lib/go/pkg/mod/github.com/andersfylling/disgord@v0.10.3/websocket/client.go:322 +0x4c1

goroutine 34116 [select]:
github.com/andersfylling/disgord/websocket.(*client).emitter(0xc000001980, 0xa14380, 0xc000094dc0)
    /usr/lib/go/pkg/mod/github.com/andersfylling/disgord@v0.10.3/websocket/client.go:566 +0x182
created by github.com/andersfylling/disgord/websocket.(*client).connect
    /usr/lib/go/pkg/mod/github.com/andersfylling/disgord@v0.10.3/websocket/client.go:320 +0x44f

goroutine 34101 [select]:
github.com/andersfylling/disgord/websocket.(*client).operationHandlers(0xc000001980, 0xa14380, 0xc000094dc0)
    /usr/lib/go/pkg/mod/github.com/andersfylling/disgord@v0.10.3/websocket/client.go:211 +0x173
created by github.com/andersfylling/disgord/websocket.(*client).startBehaviors
    /usr/lib/go/pkg/mod/github.com/andersfylling/disgord@v0.10.3/websocket/client.go:200 +0xff

goroutine 35395 [IO wait]:
internal/poll.runtime_pollWait(0x7ffba56a4c90, 0x72, 0xc000484e58)
    /usr/local/go/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc0000feb98, 0x72, 0xffffffffffffff00, 0xa0fec0, 0xd89640)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc0000feb98, 0xc00011cc00, 0x1, 0x1)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc0000feb80, 0xc00011ccd1, 0x1, 0x1, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:169 +0x179
net.(*netFD).Read(0xc0000feb80, 0xc00011ccd1, 0x1, 0x1, 0xc000062a18, 0xc000484f68, 0x470965)
    /usr/local/go/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc000516008, 0xc00011ccd1, 0x1, 0x1, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:177 +0x68
net/http.(*connReader).backgroundRead(0xc00011ccc0)
    /usr/local/go/src/net/http/server.go:676 +0x5a
created by net/http.(*connReader).startBackgroundRead
    /usr/local/go/src/net/http/server.go:672 +0xd2

goroutine 34119 [select, 245 minutes]:
github.com/andersfylling/disgord/websocket.(*client).connect.func1(0xa14380, 0xc000094dc0, 0xc000001980)
    /usr/lib/go/pkg/mod/github.com/andersfylling/disgord@v0.10.3/websocket/client.go:324 +0xbe
created by github.com/andersfylling/disgord/websocket.(*client).connect
    /usr/lib/go/pkg/mod/github.com/andersfylling/disgord@v0.10.3/websocket/client.go:323 +0x4fa

goroutine 34625 [IO wait]:
internal/poll.runtime_pollWait(0x7ffba56a4e30, 0x72, 0xc0004c8870)
    /usr/local/go/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc000261e98, 0x72, 0xffffffffffffff00, 0xa0fec0, 0xd89640)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc000261e98, 0xc00056c000, 0x2000, 0x2000)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc000261e80, 0xc00056c000, 0x2000, 0x2000, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:169 +0x179
net.(*netFD).Read(0xc000261e80, 0xc00056c000, 0x2000, 0x2000, 0x40a52b, 0xc000010000, 0x8f4320)
    /usr/local/go/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc00000f1f8, 0xc00056c000, 0x2000, 0x2000, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:177 +0x68
crypto/tls.(*block).readFromUntil(0xc00053e4b0, 0xa0ed40, 0xc00000f1f8, 0x5, 0xc00000f1f8, 0x0)
    /usr/local/go/src/crypto/tls/conn.go:492 +0x89
crypto/tls.(*Conn).readRecord(0xc000073180, 0x9a5517, 0xc0000732a0, 0x40d1b3)
    /usr/local/go/src/crypto/tls/conn.go:593 +0xdd
crypto/tls.(*Conn).Read(0xc000073180, 0xc0005b1000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/local/go/src/crypto/tls/conn.go:1145 +0xf1
bufio.(*Reader).Read(0xc00025ff80, 0xc000242658, 0x9, 0x9, 0xc00018a700, 0xc0004c8c58, 0xc0004c8c70)
    /usr/local/go/src/bufio/bufio.go:216 +0x22f
io.ReadAtLeast(0xa0ddc0, 0xc00025ff80, 0xc000242658, 0x9, 0x9, 0x9, 0xc0004c8cf8, 0x66b8eb, 0xc0000939e0)
    /usr/local/go/src/io/io.go:310 +0x88
io.ReadFull(0xa0ddc0, 0xc00025ff80, 0xc000242658, 0x9, 0x9, 0xc0004c8d28, 0x66b7fd, 0xc0004c8fb8)
    /usr/local/go/src/io/io.go:329 +0x58
net/http.http2readFrameHeader(0xc000242658, 0x9, 0x9, 0xa0ddc0, 0xc00025ff80, 0x0, 0x0, 0xc00048e030, 0x0)
    /usr/local/go/src/net/http/h2_bundle.go:1545 +0x7b
net/http.(*http2Framer).ReadFrame(0xc000242620, 0xc00048e030, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/http/h2_bundle.go:1803 +0xa3
net/http.(*http2clientConnReadLoop).run(0xc0004c8fb8, 0x9a4e30, 0xc00003f7b8)
    /usr/local/go/src/net/http/h2_bundle.go:8261 +0x9e
net/http.(*http2ClientConn).readLoop(0xc0001a4380)
    /usr/local/go/src/net/http/h2_bundle.go:8189 +0x76
created by net/http.(*http2Transport).newClientConn
    /usr/local/go/src/net/http/h2_bundle.go:7264 +0x636

goroutine 34115 [IO wait]:
internal/poll.runtime_pollWait(0x7ffba56a4d60, 0x72, 0xc00004a8c0)
    /usr/local/go/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc00044a298, 0x72, 0xffffffffffffff00, 0xa0fec0, 0xd89640)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc00044a298, 0xc00058e000, 0x1000, 0x1000)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc00044a280, 0xc00058e000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/local/go/src/internal/poll/fd_unix.go:169 +0x179
net.(*netFD).Read(0xc00044a280, 0xc00058e000, 0x1000, 0x1000, 0x40a52b, 0xc000010000, 0x8f4320)
    /usr/local/go/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc00000e2b8, 0xc00058e000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:177 +0x68
crypto/tls.(*block).readFromUntil(0xc0004e8630, 0xa0ed40, 0xc00000e2b8, 0x5, 0xc00000e2b8, 0x0)
    /usr/local/go/src/crypto/tls/conn.go:492 +0x89
crypto/tls.(*Conn).readRecord(0xc000072700, 0x9a5517, 0xc000072820, 0x20)
    /usr/local/go/src/crypto/tls/conn.go:593 +0xdd
crypto/tls.(*Conn).Read(0xc000072700, 0xc0004cc000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    /usr/local/go/src/crypto/tls/conn.go:1145 +0xf1
bufio.(*Reader).fill(0xc0005f8960)
    /usr/local/go/src/bufio/bufio.go:100 +0x106
bufio.(*Reader).Peek(0xc0005f8960, 0x2, 0xc000332338, 0xc00004ad28, 0x7019a2, 0xc000332338, 0xc00027601f)
    /usr/local/go/src/bufio/bufio.go:132 +0x3f
github.com/gorilla/websocket.(*Conn).read(0xc0000f4420, 0x2, 0x1, 0x4, 0xc00027601f, 0xc000332338, 0x6ffaf9)
    /usr/lib/go/pkg/mod/github.com/gorilla/websocket@v1.4.0/conn.go:357 +0x40
github.com/gorilla/websocket.(*Conn).advanceFrame(0xc0000f4420, 0x0, 0x0, 0x406f45)
    /usr/lib/go/pkg/mod/github.com/gorilla/websocket@v1.4.0/conn.go:787 +0x5c
github.com/gorilla/websocket.(*Conn).NextReader(0xc0000f4420, 0x4, 0xc00004afb0, 0xc000221080, 0xc00004af08, 0x406c9f)
    /usr/lib/go/pkg/mod/github.com/gorilla/websocket@v1.4.0/conn.go:947 +0xa0
github.com/gorilla/websocket.(*Conn).ReadMessage(0xc0000f4420, 0x0, 0xc000540000, 0xc000080050, 0xc000540001, 0xc000540000, 0xc000540000)
    /usr/lib/go/pkg/mod/github.com/gorilla/websocket@v1.4.0/conn.go:1028 +0x2f
github.com/andersfylling/disgord/websocket.(*gorilla).Read(0xc00023c640, 0xc0004b4540, 0x600, 0x8de560, 0xc000063200, 0x0)
    /usr/lib/go/pkg/mod/github.com/andersfylling/disgord@v0.10.3/websocket/websocket_gorilla.go:75 +0x4c
github.com/andersfylling/disgord/websocket.(*client).receiver(0xc000001980, 0xa14380, 0xc000094dc0)
    /usr/lib/go/pkg/mod/github.com/andersfylling/disgord@v0.10.3/websocket/client.go:632 +0xe0
created by github.com/andersfylling/disgord/websocket.(*client).connect
    /usr/lib/go/pkg/mod/github.com/andersfylling/disgord@v0.10.3/websocket/client.go:319 +0x416

Just for note taking:

  • not using voice, event client only.
  • bot was alive for 22,346 minutes (~15.5 days) before bug occurred
  • bot is still shown as connected. But does not seem to react to events (...)

the goroutines

goroutine 34119 is created on connecting to the discord gateway. It only listens for a ctx.Done() to trigger or if the shared system shutdown channel is closed, to properly disconnect the shard. The select statement in itself should not be an issue; the only code triggered is for system shutdown. Since this bug doesn't appear due to a system shutdown, I assume this goroutine running for 245 minutes just indicates how long this shard was connected before the bug occurred.

Extra note: the shutdown logic does not contact the logic in shard.go, where goroutine 34 takes place.

goroutine 34 is created on client start up, if socketing is used. The select statement listen for the shutdown channel to get closed (this is not the trigger, as previously explained), and for shards to request "permission" to connect to discord. Asking for permission here simply means to follow the rate limit rules, as of now.

A issue discovered for later development is that the A channel does not quit the loop when being closed:

case b, ok := <-a:
  if !ok {
    s.log.Error("b is closed")
    continue
  }

Since the only cases here are A and shutdown, this should exit the goroutine as the main channel is closed. And there's no point in having a go routine that only listens for shutdown events. Currently A is never closed anywhere so this should not be the issue.

reason behind go routine 34

Since goroutine 34119 only listens for closing of channels and then stops, I will only worry about go routine 34 here

Currently the system for connecting to discord with N shards is for the shards to request permission to connect from the shard manager. go routine 34 is where the permission requests are handled. Essentially the shard manager has an object K that can only have one owner. K is given to one of the shards that request permission to connect; once that shard has connected, or failed to connect it releases K back to the shard manager allowing other shards to connect. Each shard can then take as much time as they need (this step is repeated for each reconnect, so it does not block forever).

posted by andersfylling almost 6 years ago

Fund this Issue

$2.00
Funded
Only logged in users can fund an issue

Pull requests

Recent activities

andersfylling funded 2.00 for andersfylling/disgord# 132
almost 6 years ago