How to find the corresponding session in MogDB/openGauss from OS thread ID

诊断需求

当MogDB数据库由于某种原因占用了较大的系统负载,比如CPU占用接近100%,那么如何知道到底是数据库里的哪个会话或者哪些会话占用了资源?
在Oracle数据库中,这样的问题诊断,通常都会关联 vsession, vprocess, 以及操作系统top命令或者ps命令中查到的操作系统进程ID。
但是MogDB本身是线程模型,在操作系统上只能看到一个进程号,那么该如何定位问题?

因为MogDB使用的是线程模型,与PostgreSQL的进程模型不同,在操作系统级别如果用lsof命令查看网络端口的接入,虽然是会看到有多个用户会话接入,但是在PID列只会显示进程ID,无法对应到线程。

$ lsof -i 4 -a -p `pgrep -u omm3 mogdb`
COMMAND   PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
mogdb   12027 omm3    8u  IPv4 20313752      0t0  TCP *:biimenu (LISTEN)
mogdb   12027 omm3    9u  IPv4 20313753      0t0  TCP *:18001 (LISTEN)
mogdb   12027 omm3  325u  IPv4 28320946      0t0  TCP mogdb-kernel-0004:biimenu->softbank060132034173.bbtec.net:45206 (ESTABLISHED)
mogdb   12027 omm3  330u  IPv4 28316174      0t0  TCP mogdb-kernel-0004:biimenu->softbank060132034173.bbtec.net:45208 (ESTABLISHED)
mogdb   12027 omm3  336u  IPv4 28302815      0t0  TCP mogdb-kernel-0004:biimenu->softbank060132034173.bbtec.net:45210 (ESTABLISHED)
mogdb   12027 omm3  340u  IPv4 28323140      0t0  TCP mogdb-kernel-0004:biimenu->softbank060132034173.bbtec.net:45212 (ESTABLISHED)
mogdb   12027 omm3  360u  IPv4 28323141      0t0  TCP mogdb-kernel-0004:biimenu->softbank060132034173.bbtec.net:45214 (ESTABLISHED)
mogdb   12027 omm3  375u  IPv4 28305050      0t0  TCP mogdb-kernel-0004:biimenu->softbank060132034173.bbtec.net:45216 (ESTABLISHED)

如何获取线程ID

可以使用htop,打开htop界面以后,按F5进行进程树形显示,第一个PID是进程号,而树形结构下的每一行的PID都是该进程中对应的线程号。

也可以直接使用ps命令。-L参数表示显示线程,-o参数可以指定显示感兴趣的列。

# ps -Lp `pgrep -u omm3 mogdb` -o %cpu,tid,pid,ppid,cmd,comm
%CPU   TID   PID  PPID CMD                         COMMAND
 0.0 17847 17847     1 /opt/mogdb3/app/bin/mogdb - mogdb
 0.0 17848 17847     1 /opt/mogdb3/app/bin/mogdb - jemalloc_bg_thd
 0.0 17854 17847     1 /opt/mogdb3/app/bin/mogdb - mogdb
 0.0 17855 17847     1 /opt/mogdb3/app/bin/mogdb - syslogger
 0.0 17856 17847     1 /opt/mogdb3/app/bin/mogdb - reaper
 0.0 17857 17847     1 /opt/mogdb3/app/bin/mogdb - jemalloc_bg_thd
 0.0 17858 17847     1 /opt/mogdb3/app/bin/mogdb - jemalloc_bg_thd
 0.0 17860 17847     1 /opt/mogdb3/app/bin/mogdb - jemalloc_bg_thd
 0.0 17884 17847     1 /opt/mogdb3/app/bin/mogdb - checkpointer
 0.0 17885 17847     1 /opt/mogdb3/app/bin/mogdb - Spbgwriter
 0.1 17886 17847     1 /opt/mogdb3/app/bin/mogdb - pagewriter
 0.0 17887 17847     1 /opt/mogdb3/app/bin/mogdb - pagewriter
 0.0 17888 17847     1 /opt/mogdb3/app/bin/mogdb - pagewriter
 0.0 17889 17847     1 /opt/mogdb3/app/bin/mogdb - pagewriter
 0.0 17890 17847     1 /opt/mogdb3/app/bin/mogdb - pagewriter
 0.8 17891 17847     1 /opt/mogdb3/app/bin/mogdb - WALwriter
 0.0 17892 17847     1 /opt/mogdb3/app/bin/mogdb - WALwriteraux
 0.0 17893 17847     1 /opt/mogdb3/app/bin/mogdb - AVClauncher
 0.0 17894 17847     1 /opt/mogdb3/app/bin/mogdb - Jobscheduler
 0.0 17895 17847     1 /opt/mogdb3/app/bin/mogdb - asyncundolaunch
 0.0 17896 17847     1 /opt/mogdb3/app/bin/mogdb - globalstats
 0.0 17897 17847     1 /opt/mogdb3/app/bin/mogdb - applylauncher
 0.0 17898 17847     1 /opt/mogdb3/app/bin/mogdb - statscollector
 0.0 17899 17847     1 /opt/mogdb3/app/bin/mogdb - snapshotworker
 0.1 17900 17847     1 /opt/mogdb3/app/bin/mogdb - TrackStmtWorker
 0.0 17901 17847     1 /opt/mogdb3/app/bin/mogdb - 2pccleaner
 0.0 17902 17847     1 /opt/mogdb3/app/bin/mogdb - faultmonitor
 0.0 17904 17847     1 /opt/mogdb3/app/bin/mogdb - undorecycler
 0.0 18372 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 18373 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 18374 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 18375 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 18376 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 18377 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 18378 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 18379 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 18380 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 18381 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 18382 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 18454 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 19475 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 19480 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 29529 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 30999 17847     1 /opt/mogdb3/app/bin/mogdb - worker

comm列中显示为woker的都是用户会话的数据库端后台进程,通常占用CPU较高的会是用户会话,因此可以用grep命令来过滤只显示用户会话。

#  ps -Lp `pgrep -u omm3 mogdb` -o %cpu,tid,pid,ppid,cmd,comm | grep -w worker
 0.0 18372 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 18373 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 18374 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 18375 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 18376 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 18377 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 18378 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 18379 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 18380 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 18381 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 18382 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 18454 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 19475 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 19480 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 29529 17847     1 /opt/mogdb3/app/bin/mogdb - worker
 0.0 30999 17847     1 /opt/mogdb3/app/bin/mogdb - worker

如何将操作系统线程ID跟数据库会话对应

假设在这台服务器上,线程ID=18372的线程占用了大量CPU。

在MogDB里查询pg_os_threads视图,可以获得该线程在数据库中对应的会话ID 。

MogDB=# select * from pg_os_threads where lwpid=18372;
 node_name |       pid       | lwpid | thread_name |         creation_time
-----------+-----------------+-------+-------------+-------------------------------
 dn_6001   | 140545137571584 | 18372 | dn_6001     | 2022-05-30 19:54:42.459129+08
(1 row)

pg_os_threads视图里记录了轻量级线程号和会话ID的关系,其中lwpid是操作系统线程ID,pid列是数据库会话ID。具体信息可以参考MogDB文档-PG_OS_THREADS
如果具备monadmin权限,也可以查询dbe_perf模式下的os_threads视图,信息是一样的。

找到数据库会话ID之后,就可以为所欲为了,比如可以查询dbe_perf.session_stat_activity视图来获取该会话的应用名,发起的客户端IP地址,还有该会话正在执行的SQL。

MogDB=# select application_name,client_addr,query from dbe_perf.session_stat_activity where pid=140545137571584;
 application_name | client_addr  |                          query
------------------+--------------+---------------------------------------------------------
 dn_6001          | 172.16.0.176 | SELECT cfg_value FROM bmsql_config  WHERE cfg_name = $1
(1 row)

也可以查询dbe_perf.thread_wait_status视图来获取会话的当前等待事件。BTW,实际上该视图中有lwtid字段,可以直接对应到线程ID。

MogDB=# select lwtid,wait_status,wait_event from dbe_perf.thread_wait_status where sessionid=140545137571584;
 lwtid | wait_status | wait_event
-------+-------------+------------
 18372 | wait cmd    | wait cmd
(1 row)

Leave a Reply

Your email address will not be published. Required fields are marked *