48 stories
·
0 followers

Slack's Secret STDERR Messages

1 Share
These are rough notes. I run the Slack messaging application on Ubuntu Linux, and it recently started mysteriously crashing. I'd Alt-Tab and find it was no longer there. No error message, no dialog, just gone. It usually happened when locking and unlocking the screen. A quick internet search revealed nothing. These are rough notes for how I debugged it, in case it's useful for someone searching on this topic. I spend many hours documenting advanced debugging stories for books, talks, and blog posts, but many things I never have time to share. I'm experimenting with this "rough notes" format as a way to quickly share things. No editing, spell checking, or comments. Mostly screenshots. Dead ends included. Note that I don't know anything about Slack internals, and there may be better ways to solve this. ## 1. Enabling core dumps I'm guessing it's core dumping and Ubuntu's apport is eating them. Redirecting them to the file system so I can then do core dump analysis using gdb(1), as root:
# cat /proc/sys/kernel/core_pattern
|/usr/share/apport/apport %p %s %c %d %P
# mkdir /var/cores
# echo "/var/cores/core.%e.%p.%h.%t" > /proc/sys/kernel/core_pattern
[...another crash...]
# ls /var/cores
#
This didn't work: No core file showed up. I may need to increase the core file size ulimits for Slack, but that might mean mucking around with its startup scripts; I'll try some other tracing first. ## 2. exitsnoop Using an eBPF/bcc tool to look for exit reasons:
# exitsnoop -t
TIME-AEST    PCOMM            PID    PPID   TID    AGE(s)  EXIT_CODE 
13:51:19.432 kworker/dying    3663305 2      3663305 1241.59 0
13:51:30.948 kworker/dying    3663626 2      3663626 835.76  0
13:51:33.296 systemd-udevd    3664149 2054939 3664149 3.55    0
13:53:09.256 kworker/dying    3662179 2      3662179 2681.15 0
13:53:25.636 kworker/dying    3663520 2      3663520 1122.60 0
13:53:30.705 grep             3664239 6009   3664239 0.08    0
13:53:30.705 ps               3664238 6009   3664238 0.08    0
13:53:40.297 slack            3663135 1786   3663135 1459.54 signal 6 (ABRT)
13:53:40.298 slack            3663208 3663140 3663208 1457.86 0
13:53:40.302 slack            3663140 1786   3663140 1459.18 0
13:53:40.302 slack            3663139 1786   3663139 1459.18 0
13:53:40.303 slack            3663171 1786   3663171 1458.22 0
13:53:40.317 slack            3663197 1786   3663197 1458.03 0
13:53:44.827 gdm-session-wor  3664269 1778   3664269 0.02    0
[...]
This traced a Slack SIGABRT which happened around the same time as a crash. A strong lead. ## 3. killsnoop Running killsnoop (eBPF/bcc) to get more info:
# killsnoop
TIME      PID    COMM             SIG  TPID   RESULT
13:45:01  2053366 systemd-journal  0    1024   0
13:45:01  2053366 systemd-journal  0    3663525 -3
13:45:01  2053366 systemd-journal  0    3663528 -3
13:49:00  2054939 systemd-udevd    15   3664053 0
13:51:33  2054939 systemd-udevd    15   3664149 0
13:53:44  2053366 systemd-journal  0    4265   -1
13:53:44  2053366 systemd-journal  0    972    0
13:53:44  2053366 systemd-journal  0    1778   0
13:53:44  2053366 systemd-journal  0    6414   -1
[...]
A crash happened, but killsnoop(8) didn't see it. A quick look at the killsnoop(8) source reminded me that I wrote it back in 2015, which is practically ancient in eBPF years. Back then there wasn't tracepoint support yet so I was using kprobes for everything. Kprobes aren't a stable interface, which might be the problem. ## 4. perf trace Nowadays this can be done as a perf one-liner:
# perf list syscalls:sys_enter_*kill

List of pre-defined events (to be used in -e):

  syscalls:sys_enter_kill                            [Tracepoint event]
  syscalls:sys_enter_tgkill                          [Tracepoint event]
  syscalls:sys_enter_tkill                           [Tracepoint event]

# perf trace -e 'syscalls:sys_enter_*kill'
 15755.483 slack/3684015 syscalls:sys_enter_tgkill(tgid: 3684015 (slack), pid: 3684015 (slack), sig: ABRT)
Ok, so there's our slack SIGABRT, sent via tgkill(2). (And I filed an issue to update bcc killsnoop(8) to use tracepoints.) This output doesn't really tell me much more about it though. I want to see a stack trace. I can use perf record or bpftrace...and that reminds me, didn't I write another signal tool using bpftrace? ## 5. signals.bt The signals.bt bpftrace tool from my BPF book traces the signal:signal_generate tracepoint, which should catch every type of generated signal, including tgkill(2). Trying it out:
# bpftrace /home/bgregg/Git/bpf-perf-tools-book/originals/Ch13_Applications/signals.bt
Attaching 3 probes...
Counting signals. Hit Ctrl-C to end.
^C
@[SIGNAL, PID, COMM] = COUNT

@[SIGPIPE, 1883, Xorg]: 1
@[SIGCHLD, 1797, dbus-daemon]: 1
@[SIGINT, 3665167, bpftrace]: 1
@[SIGTERM, 3665198, gdm-session-wor]: 1
@[SIGCHLD, 3665197, gdm-session-wor]: 1
@[SIGABRT, 3664940, slack]: 1
@[SIGTERM, 3665197, gdm-session-wor]: 1
@[SIGKILL, 3665207, dbus-daemon]: 1
@[SIGWINCH, 859450, bash]: 2
@[SIGCHLD, 1778, gdm-session-wor]: 2
@[, 3665201, gdbus]: 2
@[, 3665199, gmain]: 2
@[SIGWINCH, 3665167, bpftrace]: 2
@[SIGWINCH, 3663319, vi]: 2
@[SIGCHLD, 1786, systemd]: 6
@[SIGALRM, 1883, Xorg]: 106
Ok, there's the SIGABRT for slack. (There's a new sigsnoop(8) tool for bcc that uses this tracepoint as well.) ## 6. Signal Stacks Moving from signals.bt to a bpftrace one-liner:
# bpftrace -e 't:signal:signal_generate /comm == "slack"/ { printf("%d, %s%s\n", args->sig, kstack, ustack); }'
Attaching 1 probe...
6, 
        __send_signal+579
        __send_signal+579
        send_signal+221
        do_send_sig_info+81
        do_send_specific+110
        do_tkill+171
        __x64_sys_tgkill+34
        do_syscall_64+73
        entry_SYSCALL_64_after_hwframe+68

        0x7f4a2e2e2f95
This was supposed to print both the kernel and user stack traces that led to the SIGABRT, but the user stack is broken, showing 0x7f4a2e2e2f95 only. Grumble. There's ways to fix this, but it usually gets time consuming, so let me try something else first. Logs! ## 7. Logs Does Slack have logs? I have no idea. Maybe they contain the error message.
# lsof -p `pgrep -n slack` | grep -i log
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
      Output information may be incomplete.
lsof: WARNING: can't stat() fuse file system /run/user/1000/doc
      Output information may be incomplete.
Ignore the lsof(8) warnings. There's no output here, nothing containing "log". Although I'm looking at the most recent process called "slack" and maybe that's the wrong one.
# pstree -ps `pgrep -n slack`
systemd(1)───systemd(1786)───slack(3666477)───slack(3666481)───slack(3666548)─┬─{slack}(3666549)
                                                                              ├─{slack}(3666551)
                                                                              ├─{slack}(3666552)
                                                                              ├─{slack}(3666553)
                                                                              ├─{slack}(3666554)
                                                                              ├─{slack}(3666555)
                                                                              ├─{slack}(3666556)
                                                                              ├─{slack}(3666557)
                                                                              ├─{slack}(3666558)
                                                                              ├─{slack}(3666559)
                                                                              ├─{slack}(3666560)
                                                                              ├─{slack}(3666564)
                                                                              ├─{slack}(3666566)
                                                                              ├─{slack}(3666568)
                                                                              └─{slack}(3666609)
Ok, how about I try the great-grandparent, PID 3666477:
# lsof -p 3666477 | grep -i log
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
      Output information may be incomplete.
lsof: WARNING: can't stat() fuse file system /run/user/1000/doc
      Output information may be incomplete.
slack   3666477 bgregg   37r    REG      253,1     32768   140468 /home/bgregg/.local/share/gvfs-metadata/home-8fd8d123.log
slack   3666477 bgregg   40r    REG      253,1     32768   131314 /home/bgregg/.local/share/gvfs-metadata/trash:-85854456.log
slack   3666477 bgregg   71w    REG      253,1     15566  1707316 /home/bgregg/.config/Slack/Local Storage/leveldb/013430.log
slack   3666477 bgregg   72w    REG      253,1       347  1704816 /home/bgregg/.config/Slack/Local Storage/leveldb/LOG
slack   3666477 bgregg   73w    REG      253,1   2324236  1718407 /home/bgregg/.config/Slack/logs/browser.log
slack   3666477 bgregg   90w    REG      253,1    363600  1713625 /home/bgregg/.config/Slack/Service Worker/Database/000004.log
slack   3666477 bgregg   92w    REG      253,1       274  1704249 /home/bgregg/.config/Slack/Service Worker/Database/LOG
slack   3666477 bgregg  108w    REG      253,1   4182513  1749672 /home/bgregg/.config/Slack/logs/webapp-service-worker-console.log
slack   3666477 bgregg  116w    REG      253,1       259  1704369 /home/bgregg/.config/Slack/Session Storage/LOG
slack   3666477 bgregg  122w    REG      253,1     31536  1749325 /home/bgregg/.config/Slack/Session Storage/000036.log
slack   3666477 bgregg  126w    REG      253,1   3970909  1704566 /home/bgregg/.config/Slack/logs/webapp-console.log
slack   3666477 bgregg  127w    REG      253,1   2330006  1748923 /home/bgregg/.config/Slack/IndexedDB/https_app.slack.com_0.indexeddb.leveldb/023732.log
slack   3666477 bgregg  131w    REG      253,1       330  1704230 /home/bgregg/.config/Slack/IndexedDB/https_app.slack.com_0.indexeddb.leveldb/LOG
slack   3666477 bgregg  640r    REG      253,1     32768   140378 /home/bgregg/.local/share/gvfs-metadata/root-7d269acf.log (deleted)
Lots of logs in ~/config/Slack/logs!
# cd ~/.config/Slack/logs
# ls -lrth
total 26M
-rw-rw-r-- 1 bgregg bgregg 5.0M Aug 20 07:54 webapp-service-worker-console1.log
-rw-rw-r-- 1 bgregg bgregg 5.1M Aug 23 19:30 webapp-console2.log
-rw-rw-r-- 1 bgregg bgregg 5.1M Aug 25 16:07 webapp-console1.log
drwxrwxr-x 2 bgregg bgregg 4.0K Aug 27 14:34 recorded-trace/
-rw-rw-r-- 1 bgregg bgregg 4.0M Aug 27 14:46 webapp-service-worker-console.log
-rw-rw-r-- 1 bgregg bgregg 2.3M Aug 27 14:46 browser.log
-rw-rw-r-- 1 bgregg bgregg 3.9M Aug 27 14:46 webapp-console.log
Ok, so how about this one:
# cat webapp-console.log
[...]
[08/27/21, 14:46:36:238] info: [API-Q] (TKZ41AXQD) 614b3789-1630039595.801 dnd.teamInfo is RESOLVED 
[08/27/21, 14:46:36:240] info: [API-Q] (TKZ41AXQD) 614b3789-1630039595.930 users.interactions.list is RESOLVED 
[08/27/21, 14:46:36:242] info: [DND] (TKZ41AXQD) Fetched DND info for the following member: ULG5H012L 
[08/27/21, 14:46:36:251] info: [DND] (TKZ41AXQD) Checking for changes in DND status for the following members: ULG5H012L 
[08/27/21, 14:46:36:254] info: [DND] (TKZ41AXQD) Will check for changes in DND status again in 5 minutes 
[08/27/21, 14:46:36:313] info: [DND] (TKZ41AXQD) Fetched DND info for the following members: UL0US3455 
[08/27/21, 14:46:36:313] info: [DND] (TKZ41AXQD) Checking for changes in DND status for the following members: ULG5H012L,UL0US3455 
[08/27/21, 14:46:36:314] info: [DND] (TKZ41AXQD) Will check for changes in DND status again in 5 minutes 
[08/27/21, 14:46:37:337] info: [FOCUS-EVENT] Client window blurred 
[08/27/21, 14:46:40:022] info: [RTM] (T029N2L97) Processed 1 user_typing event(s) in channel(s) C0S9267BE over 0.10ms 
[08/27/21, 14:46:40:594] info: [RTM] (T029N2L97) Processed 1 message:message_replied event(s) in channel(s) C0S9267BE over 2.60ms 
[08/27/21, 14:46:40:595] info: [RTM] Setting a timeout of 37 ms to process more rtm events 
[08/27/21, 14:46:40:633] info: [RTM] Waited 37 ms, processing more rtm events now 
[08/27/21, 14:46:40:653] info: [RTM] (T029N2L97) Processed 1 message event(s) in channel(s) C0S9267BE over 18.60ms 
[08/27/21, 14:46:44:938] info: [RTM] (T029N2L97) Processed 1 user_typing event(s) in channel(s) C0S9267BE over 0.00ms 
No, I don't see any errors jumping out at me. How about searching for errors:
# egrep -i 'error|fail' webapp-console.log
[08/25/21, 16:07:13:051] info: [DESKTOP-SIDE-EFFECT] (TKZ41AXQD) Reacting to  {"type":"[39] Set a value that represents whether we are currently in the boot phase","payload":false,"error":false} 
[08/25/21, 16:07:13:651] info: [DESKTOP-SIDE-EFFECT] (T7GLTMS0P) Reacting to  {"type":"[39] Set a value that represents whether we are currently in the boot phase","payload":false,"error":false} 
[08/25/21, 16:07:14:249] info: [DESKTOP-SIDE-EFFECT] (T0DS04W11) Reacting to  {"type":"[39] Set a value that represents whether we are currently in the boot phase","payload":false,"error":false} 
[08/25/21, 16:07:14:646] info: [DESKTOP-SIDE-EFFECT] (T0375HBGA) Reacting to  {"type":"[39] Set a value that represents whether we are currently in the boot phase","payload":false,"error":false} 
[...]
# egrep -i 'error|fail' browser.log 
[07/16/21, 08:18:27:621] error: Cannot override webPreferences key(s): webviewTag, nativeWindowOpen, nodeIntegration, nodeIntegrationInWorker, nodeIntegrationInSubFrames, enableRemoteModule, contextIsolation, sandbox 
[07/16/21, 08:18:27:653] error: Failed to load empty window url in window 
  "error": {
    "stack": "Error: ERR_ABORTED (-3) loading 'about:blank'\n    at rejectAndCleanup (electron/js2c/browser_init.js:217:1457)\n    at Object.navigationListener (electron/js2c/browser_init.js:217:1763)\n    at Object.emit (events.js:315:20)\n    at Object.EventEmitter.emit (domain.js:467:12)",
[07/16/21, 08:18:31:355] error: Cannot override webPreferences key(s): webviewTag, nativeWindowOpen, nodeIntegration, nodeIntegrationInWorker, nodeIntegrationInSubFrames, enableRemoteModule, contextIsolation, sandbox 
[07/16/21, 08:18:31:419] error: Cannot override webPreferences key(s): webviewTag, nativeWindowOpen, nodeIntegration, nodeIntegrationInWorker, nodeIntegrationInSubFrames, enableRemoteModule, contextIsolation, sandbox 
[07/24/21, 09:00:52:252] error: Failed to load calls-desktop-interop.WindowBorderPanel 
  "error": {
    "stack": "Error: Module did not self-register: '/snap/slack/42/usr/lib/slack/resources/app.asar.unpacked/node_modules/@tinyspeck/calls-desktop-interop/build/Release/CallsDesktopInterop.node'.\n    at process.func [as dlopen] (electron/js2c/asar_bundle.js:5:1846)\n    at Object.Module._extensions..node (internal/modules/cjs/loader.js:1138:18)\n    at Object.func [as .node] (electron/js2c/asar_bundle.js:5:2073)\n    at Module.load (internal/modules/cjs/loader.js:935:32)\n    at Module._load (internal/modules/cjs/loader.js:776:14)\n    at Function.f._load (electron/js2c/asar_bundle.js:5:12684)\n    at Module.require (internal/modules/cjs/loader.js:959:19)\n    at require (internal/modules/cjs/helpers.js:88:18)\n    at bindings (/snap/slack/42/usr/lib/slack/resources/app.asar/node_modules/bindings/bindings.js:112:48)\n    at Object. (/snap/slack/42/usr/lib/slack/resources/app.asar/node_modules/@tinyspeck/calls-desktop-interop/lib/index.js:1:34)",
[07/24/21, 09:00:52:260] warn: Failed to install protocol handler for slack:// links 
[07/24/21, 09:00:52:440] error: Cannot override webPreferences key(s): webviewTag 
[...]
I browsed the logs for a while but didn't see a smoking gun. Surely it spits out some error message when crashing, like to STDERR... ## 8. STDERR Tracing Where is STDERR written?
# lsof -p 3666477
[...]
slack   3666477 bgregg  mem     REG               7,16    141930     7165 /snap/slack/44/usr/lib/slack/chrome_100_percent.pak
slack   3666477 bgregg  mem     REG               7,16    165680     7433 /snap/slack/44/usr/lib/slack/v8_context_snapshot.bin
slack   3666477 bgregg    0r    CHR                1,3       0t0        6 /dev/null
slack   3666477 bgregg    1w    CHR                1,3       0t0        6 /dev/null
slack   3666477 bgregg    2w    CHR                1,3       0t0        6 /dev/null
slack   3666477 bgregg    3r   FIFO               0,12       0t0 29532192 pipe
slack   3666477 bgregg    4u   unix 0x00000000134e3c45       0t0 29526717 type=SEQPACKET
slack   3666477 bgregg    5r    REG               7,16  10413488     7167 /snap/slack/44/usr/lib/slack/icudtl.dat
[...]
/dev/null? Like that's going to stop me. I could trace writes to STDERR, but I think my old shellsnoop(8) tool (another from eBPF/bcc) already does that:
# shellsnoop 3666477
[...]
[08/27/21, 14:46:36:314] info: [DND] (TKZ41AXQD) Will check for changes in DND status again in 5 minutes 
[08/27/21, 14:46:37:337] info: [FOCUS-EVENT] Client window blurred 
[08/27/21, 14:46:40:022] info: [RTM] (T029N2L97) Processed 1 user_typing event(s) in channel(s) C0S928EBE over 0.10ms 
[08/27/21, 14:46:40:594] info: [RTM] (T029N2L97) Processed 1 message:message_replied event(s) in channel(s) C0S928EBE over 2.60ms 
[08/27/21, 14:46:40:595] info: [RTM] Setting a timeout of 37 ms to process more rtm events 
[08/27/21, 14:46:40:633] info: [RTM] Waited 37 ms, processing more rtm events now 
[08/27/21, 14:46:40:653] info: [RTM] (T029N2L97) Processed 1 message event(s) in channel(s) C0S928EBE over 18.60ms 


[08/27/21, 14:46:44:938] info: [RTM] (T029N2L97) Processed 1 user_typing event(s) in channel(s) C0S928EBE over 0.00ms 

(slack:3666477): Gtk-WARNING **: 14:46:45.525: Could not load a pixbuf from icon theme.
This may indicate that pixbuf loaders or the mime database could not be found.
**
Gtk:ERROR:../../../../gtk/gtkiconhelper.c:494:ensure_surface_for_gicon: assertion failed (error == NULL): Failed to load /usr/share/icons/Yaru/16x16/status/image-missing.png: Unable to load image-loading module: /snap/slack/42/usr/lib/x86_64-linux-gnu/gdk-pixbuf-2.0/2.10.0/loaders/libpixbufloader-png.so: /snap/slack/42/usr/lib/x86_64-linux-gnu/gdk-pixbuf-2.0/2.10.0/loaders/libpixbufloader-png.so: cannot open shared object file: No such file or directory (gdk-pixbuf-error-quark, 5)
Ah-ha! The last message printed is an error about a .png file and a .so file. As it's Slack's final mesage before crashing, this is a smoking gun. Note that this was not in any log!:
# grep image-missing.png *
grep: recorded-trace: Is a directory
It's the .so file that is missing, not the .png:
# ls -lh /usr/share/icons/Yaru/16x16/status/image-missing.png
-rw-r--r-- 1 root root 535 Nov  6  2020 /usr/share/icons/Yaru/16x16/status/image-missing.png
# ls -lh /snap/slack/42/usr/lib/x86_64-linux-gnu/gdk-pixbuf-2.0/2.10.0/loaders/libpixbufloader-png.so
ls: cannot access '/snap/slack/42/usr/lib/x86_64-linux-gnu/gdk-pixbuf-2.0/2.10.0/loaders/libpixbufloader-png.so': No such file or directory
But there is a .so file with a similar path:
# ls -lh /snap/slack/
total 0
drwxrwxr-x 8 root root 123 Jul 14 02:49 43/
drwxrwxr-x 8 root root 123 Aug 18 10:27 44/
lrwxrwxrwx 1 root root   2 Aug 24 09:48 current -> 44/
# ls -lh /snap/slack/44/usr/lib/x86_64-linux-gnu/gdk-pixbuf-2.0/2.10.0/loaders/libpixbufloader-png.so
-rw-r--r-- 1 root root 27K Aug 18 10:27 /snap/slack/44/usr/lib/x86_64-linux-gnu/gdk-pixbuf-2.0/2.10.0/loaders/libpixbufloader-png.so
Hmm, I wonder... ## 9. Workaround This is obviously a hack and is not guaranteed to be safe:
# cd /snap/slack
# ln -s current 42
# ls -lh
total 0
lrwxrwxrwx 1 root root   7 Aug 27 15:01 42 -> current/
drwxrwxr-x 8 root root 123 Jul 14 02:49 43/
drwxrwxr-x 8 root root 123 Aug 18 10:27 44/
lrwxrwxrwx 1 root root   2 Aug 24 09:48 current -> 44/
# ls -lh /snap/slack/42/usr/lib/x86_64-linux-gnu/gdk-pixbuf-2.0/2.10.0/loaders/libpixbufloader-png.so
-rw-r--r-- 1 root root 27K Aug 18 10:27 /snap/slack/42/usr/lib/x86_64-linux-gnu/gdk-pixbuf-2.0/2.10.0/loaders/libpixbufloader-png.so
I don't know why Slack was looking up this library via the old directory version, but linking the new version to the old path did the trick. Slack has stopped crashing! I'm guessing this is a problem with how the snap is built. Needs more debugging. ## 10. Other debugging In case you're wondering what I'd do if I didn't find the error in STDERR, I'd go back to setting ulimits to see if I could get a core dump, and if that still didn't work, I'd try to run Slack from a gdb(1) session. I'd also work on fixing the user stack trace and symbols to see what that revealed. ## 11. Bonus: opensnoop I often wonder how I could have debugged things sooner, and I'm kicking myself I didn't run opensnoop(8) as I usually do. Tracing just failed opens:
# opensnoop -Tx
TIME(s)       PID    COMM       FD ERR PATH
[...]
11.412358000  3673057 slack      -1   2 /var/lib/snapd/desktop/mime/subclasses
11.412360000  3673057 slack      -1   2 /var/lib/snapd/desktop/mime/icons
11.412363000  3673057 slack      -1   2 /var/lib/snapd/desktop/mime/generic-icons
11.412495000  3673057 slack      -1   2 /snap/slack/42/usr/lib/x86_64-linux-gnu/gdk-pixbuf-2.0/2.10.0/loaders/libpixbufloader-png.so
11.412527000  3673057 slack      -1   2 /usr/share/locale/en_AU/LC_MESSAGES/gdk-pixbuf.mo
11.412537000  3673057 slack      -1   2 /usr/share/locale/en/LC_MESSAGES/gdk-pixbuf.mo
11.412559000  3673057 slack      -1   2 /usr/share/locale-langpack/en/LC_MESSAGES/gdk-pixbuf.mo
11.412916000  3673057 slack      -1   2 /snap/slack/42/usr/lib/x86_64-linux-gnu/gdk-pixbuf-2.0/2.10.0/loaders/libpixbufloader-png.so
11.425405000  1786   systemd    -1   2 /sys/fs/cgroup/memory/user.slice/user-1000.slice/user@1000.service/snap.slack.slack.402dde03-7f71-48a0-98a5-33fd695ccbde.scope/memory.events
That shows its last failed open was to the .so file. Which would have been a good lead. But the best clue was the secret STDERR messages Slack sends to /dev/null, rescued using shellsnoop(8).
Read the whole story
bernhardbock
4 days ago
reply
Share this story
Delete

Google VCU Video Coding Unit at Hot Chips 33

1 Comment

We get a deeper dive into the Google VCU or video coding unit that is responsible for optimized video transcoding at YouTube and elsewhere

The post Google VCU Video Coding Unit at Hot Chips 33 appeared first on ServeTheHome.

Read the whole story
bernhardbock
4 days ago
reply
fun to see the bleeding edge encoding tech from my dipl. ing. thesis as baseline now ;)
Share this story
Delete

Cerebras Wafer Scale Engine 2 WSE-2 at Hot Chips 33

1 Comment

At Hot Chips 33, we see the Cerebras Wafer Scale Engine 2 (WSE-2) and its memory (MemoryX) and interconnect (SwarmX) cluster solutions

The post Cerebras Wafer Scale Engine 2 WSE-2 at Hot Chips 33 appeared first on ServeTheHome.

Read the whole story
bernhardbock
4 days ago
reply
wafer scale computing to push the boundaries of machine learning. fascinating read.
Share this story
Delete

NVIDIA BlueField-3 DPU Architecture at Hot Chips 33

1 Comment

During Hot Chips 33, we got a deeper dive into the NVIDIA BlueField-3 DPU architecture which is a 2022 product we are excited about

The post NVIDIA BlueField-3 DPU Architecture at Hot Chips 33 appeared first on ServeTheHome.

Read the whole story
bernhardbock
4 days ago
reply
lots of complexity to come with DPUs...
Share this story
Delete

RFC 9096: Improving the Reaction of Customer Edge Routers to IPv6 Renumbering Events

1 Share
This document specifies improvements to Customer Edge routers that help mitigate the problems that may arise when network configuration information becomes invalid without any explicit signaling of that condition to the local nodes. This document updates RFC 7084.
Read the whole story
bernhardbock
4 days ago
reply
Share this story
Delete

How WhatsApp enables multi-device capability

1 Share

For years, people have been asking us to create a true multi-device experience that allows people to use WhatsApp on other devices without requiring a smartphone connection.

Today, we’re announcing the rollout of a limited public beta test for WhatsApp’s updated multi-device capability. 

With this new capability, you can now use WhatsApp on your phone and up to four other nonphone devices simultaneously — even if your phone battery is dead. Each companion device will connect to your WhatsApp independently while maintaining the same level of privacy and security through end-to-end encryption that people who use WhatsApp have come to expect. Importantly, we have developed new technologies to maintain end-to-end encryption while still managing to sync your data — such as contact names, chat archives, starred messages, and more — across devices.

To achieve this, we had to rethink WhatsApp’s architecture and design new systems to enable a standalone multi-device experience while preserving privacy and end-to-end encryption

Taking smartphones out of the equation

The current WhatsApp experience for companion devices on web, macOS, Windows, and Portal uses a smartphone app as the primary device, making the phone the source of truth for all user data and the only device capable of end-to-end encrypting messages for another user, initiating calls, etc. Companion devices maintain a persistent secure connection with the phone and simply mirror its contents on their own UI.

This architecture makes it easy to deliver a seamlessly synchronized experience between a phone and companion device without compromising on security. However, it comes with some significant reliability trade-offs: By requiring the phone to perform all operations, companion devices are slower and frequently get disconnected — especially when the phone has a poor connection, its battery is running low, or the application process gets killed by the phone’s OS. It also allows for only a single companion device to be operative at a time, meaning people can’t be on a call in Portal while checking their messages on their PC, for example. 

The new WhatsApp multi-device architecture removes these hurdles, no longer requiring a smartphone to be the source of truth while still keeping user data seamlessly and securely synchronized and private.

The challenge in accomplishing this was in maintaining the secure user experience across devices without having to store people’s private messages on our servers in new ways.

Meeting the security challenges of multiple devices

Prior to the introduction of multi-device, everyone on WhatsApp was identified by a single identity key from which all encrypted communication keys were derived. With multi-device, each device now has its own identity key.

The WhatsApp server maintains a mapping between each person’s account and all their device identities. When someone wants to send a message, they get their device list keys from the server.  

We have also addressed the challenge of preventing a malicious or compromised server from eavesdropping on someone’s communications by surreptitiously adding devices to someone’s account. We use a combination of technologies to solve this: First, we have extended security codes to now represent the combination of all of someone’s device identities so that anyone and their contact can always verify all the devices they are sending messages to. 

Second, in order to reduce the number of times that someone needs to perform identity verifications, we have developed and will roll out a technology called Automatic Device Verification. This system allows for devices to automatically establish trust between each other in a way that someone needs to compare another user’s security code only if that user reregisters their entire account, rather than each time they link a new device to their account. 

Finally, we also give people additional control and protections over which devices are linked to their account. First, everyone will continue to be required to link new companion devices by scanning a QR code from their phone. This process now requires biometric authentication before linking where people have enabled this feature on compatible devices. Finally, people will be able to see all the companion devices linked to their account as well as when they were last used, and will be able to log out of them remotely if needed. 

Maintaining message privacy

When people message each other in a one-on-one chat, a pairwise encrypted session is established between each of the sender’s and recipient’s devices. WhatsApp multi-device uses a client-fanout approach, where the WhatsApp client sending the message encrypts and transmits it N number of times to N number of different devices — those in the sender and receiver’s device lists. Each message is individually encrypted using the established pairwise encryption session with each device. Messages are not stored on the server after they are delivered. For groups, we still use the same scalable Sender Key encryption scheme from the Signal Protocol.

WhatsApp Multi-device graphic

Adapting voice and video protocols for multi-device, end-to-end encryption  

When someone on WhatsApp makes a voice or video call:

  1. The initiator generates a set of random 32-byte SRTP master secrets for each of the recipient’s devices.
  2. The initiator sends an incoming call message (using the client-fanout approach described above) to each of the devices of the recipient. Each recipient’s device receives this message, which contains the encrypted SRTP master secret.
  3. If the responder answers the call from one of the devices, a SRTP encrypted call is started, protected by the SRTP master secret generated for that device.

The SRTP master secret persists in memory on the client device and is used only during the call. Our servers do not have access to the SRTP master secrets.

For group calls, the server randomly selects a participant device that is in the call (either the initiator or a device on which a user has accepted the call) to generate the SRTP master secret. That device generates the secret and sends it to other active participant devices through pairwise end-to-end encryption. This process is repeated, and the keys are reset whenever someone joins or leaves the call.

Keeping message history and other application states in sync across devices

We want to ensure that people have a consistent experience with WhatsApp no matter the device they are using. To achieve this, we synchronize message history as well as other application state data (such as contact names, whether a chat is archived, or if a message is starred) across devices. All of this data is synchronized and end-to-end encrypted between your devices.

For message history: When a companion device is linked, the primary device encrypts a bundle of the messages from recent chats and transfers them to the newly linked device. The key to this encrypted message history blob is delivered to the newly linked device via an end-to-end encrypted message. After the companion device downloads, decrypts, unpacks, and stores the messages securely, the keys are deleted. From that point forward, the companion device accesses the message history from its own local database.

Other application data requires more than an initial transfer from the phone. We also need an ongoing synchronization every time someone modifies their application state (e.g., when they add a new contact, mute a chat, or star a message).

To solve this, the WhatsApp server securely stores a copy of each application state that all of someone’s devices can access. To properly secure this, all the information, and even the metadata about the information (what kind of user data is stored or accessed), is end-to-end encrypted with constantly changing keys known only to that person’s devices. 

How to try WhatsApp multi-device beta 

We plan to initially test the experience with a small group of users from our existing beta program. We will continue optimizing performance and adding a few additional features before slowly rolling it out more broadly. Those who opt in can always opt back out.

For more information about the beta and to sign up, visit the WhatsApp Help Center.

For more information about WhatsApp multi-device, read our updated whitepaper.

Read the whole story
bernhardbock
18 days ago
reply
Share this story
Delete
Next Page of Stories