发生锁超时时,如何从错误信息检测数据库状态并找到解决方法?

~ 0 min
2016-02-26 10:46

1) 检查错误信息:

锁超时是正常的数据库行为。多个事务之间存在竞争时,等待时间超过设置的最大时间,则引起锁超时。(默认的等待时间是5秒。可以使用”set lock time n seconds”重新设置间隔时间,也可以在dmconfig.ini中设置关键词DB_LTimO并重启数据库。)

我们可以通过以下方式检测锁状态:

一种方法是在dmconfig.ini中设置关键词

DB_LGDIR=<你的日志路径>

DB_LGLCK=1

DB_LGSVR=3 ;( 1......6)

如果用户想获得更多信息,可以设置DB_LGSVR>3,并在目录中检查有关锁超时的错误信息,锁超时发生时,该目录由DB_LGDIR指定。

查看以下错误信息:

文件“db_name_data.LOG”:

LOG_TIME,BEG_TIME,STATE,RETCODE,EXE_TIME,SV_FUNC,CONNECT_ID,USERNAME,LOGIN_TIME,LOGIN_ADDR,STMT_ID,ERROR_ARG,OTHER_INFO,SQL_CMD

"2010/06/12 15:01:29","2010/06/12 15:01:24","X", 1102,5.01,"EXECUTE ", 3868,"SYSADM","2010/06/12 14:59:56","local",0,": Object ID : 2.18.12 ,Connection ID : 1976, LKMode : X .NONE, LKStatus : GRANTED, Command : [EXIT] insert into t1 values(16,'Skirt') [unknown 1096], 501507, 0, 0","INFO_03868_1","update t1 set c2 = 'Hello' where c1 = 16"

"2010/06/12 15:07:25","2010/06/12 15:07:25","X", 6151,0.00,"PREPARE ", 1976,"SYSADM","2010/06/12 14:59:52","local",0,": 123 [bisort.c 294], 122, 2, 0",,"select * from t1 order by 123"

"2010/06/12 15:07:34","2010/06/12 15:07:34","X", 6151,0.00,"PREPARE ", 1976,"SYSADM","2010/06/12 14:59:52","local",0,": 3 [bisort.c 294], 2, 2, 0",,"select * from t1 order by 1,2,3"

 

文件File “db_name_data.TXT”:

INFO_03868_1 --------------------

[ERR_LK_TIMEOUT]:: Object ID : 2.18.12 ,Connection ID : 1976, LKMode : X .NONE, LKStatus : GRANTED, Command : [EXIT] insert into t1 values(16,'Skirt')

另一种方法是从SYSLOCK、SYSTABLE、SYSYUSER中搜索锁超时信息。

例如:

dmSQL> update t1 set c2 = 'Hello' where c1 = 16;

ERROR (1102): [Dbmaster] lock time-out: Object ID: 2.18.12, Connection ID: 1976, LKMode: X .NONE, LKStatus: GRANTED, Command: [EXIT] insert into t1 values (16,'Skirt')

 

锁超时发生时,对象ID = 2.18.12被ID = 1976占用,状态是X(在1976连接被释放前,没有用户可以获得对象:2.18.12),导致后来的连接无法获得正确的数据库信息,发生锁超时。

分析锁超时的原因:

首先,执行下列SQL语句,知道什么连接占用数据库资源。

dmSQL> select HOLD_LK_CONNECTION, LOGIN_IP_ADDR, LOGIN_HOST, LOGIN_TIME, USER_NAME, TABLE_OWNER, TABLE_NAME, LK_NEW_MODE, LK_GRAN, LK_STATUS, LK_CURRENT_MODE, TIME_OF_SQL_CMD, SQL_CMD from SYSTEM.SYSLOCK, SYSTEM.SYSTABLE, SYSTEM.SYSUSER where HOLD_LK_CONNECTION=CONNECTION_ID and TABLE_ID=TABLE_OID and LK_GRAN! ='SYSTEM' and TABLE_NAME not like 'SYS%' order by 1, 2, 3, 4, 5, 6;

 

HOLD_LK_CONNECTION LOGIN_IP_ADDR LOGIN_HOST LOGIN_TIME USER_NAME TABLE_OWNER TABLE_NAME LK_NEW_MODE LK_GRAN LK_STATUS LK_CURRENT_MODE TIME_OF_SQL_CMD

SQL_CMD

=======================================================================

1976 127.0.0.1 LA-WANGXUEWEI 2010/06/12 14:59:52 SYSADM SYSADM T1 NONE.NONE TABLE GRANTED IX .NONE 2010/06/12 15:01:00 [EXIT] insert into t1 values(16,'Skirt')

 

1976 127.0.0.1 LA-WANGXUEWEI 2010/06/12 14:59:52 SYSADM SYSADM T1 NONE.NONE TUPLE GRANTED X .NONE 2010/06/12 15:01:00 [EXIT] insert into t1 values(16,'Skirt')

 

1976 127.0.0.1 LA-WANGXUEWEI 2010/06/12 14:59:52 SYSADM SYSADM T1 NONE.NONE PAGE GRANTED IX .NONE 2010/06/12 15:01:00 [EXIT] insert into t1 values(16,'Skirt')

 

3868 127.0.0.1 LA-WANGXUEWEI 2010/06/12 14:59:56 SYSADM SYSADM T1 NONE.NONE TABLE GRANTED IX .NONE 2010/06/12 15:01:50 [EXEC] select HOLD_LK_CONNECTION, LOGIN_IP_ADDR, LOGIN_HOST, LOGIN_TIME, USER_NAME, TABLE_OWNER,TABLE_NAME, LK_NEW_MODE, LK_GRAN, LK_STATUS, LK_CURRENT_MODE, TIME_OF_SQL_CMD, SQL_CMD from SYSTEM.SYSLOCK, SYSTEM.SYSTABLE, SYSTEM.SYSUSER where HOLD_LK_CONNECTION=CONNECTION_ID and TABLE_ID=TABLE_OID and LK_GRAN !='SYSTEM' and TABLE_NAME not like 'SYS%' order by 1,2,3,4,5,6

 

3868 127.0.0.1 LA-WANGXUEWEI 2010/06/12 14:59:56 SYSADM SYSADM T1 NONE.NONE PAGE GRANTED IU .NONE 2010/06/12 15:01:50 [EXEC] select HOLD_LK_CONNECTION, LOGIN_IP_ADDR, LOGIN_HOST, LOGIN_TIME, USER_NAME, TABLE_OWNER,TABLE_NAME, LK_NEW_MODE, LK_GRAN, LK_STATUS, LK_CURRENT_MODE, TIME_OF_SQL_CMD, SQL_CMD from SYSTEM.SYSLOCK, SYSTEM.SYSTABLE, SYSTEM.SYSUSER where HOLD_LK_CONNECTION=CONNECTION_ID and TABLE_ID=TABLE_OID and LK_GRAN !='SYSTEM' and TABLE_NAME not like 'SYS%' order by 1,2,3,4,5,6

 

We found connection ID 1976 lock the resource by executing command: insert into t1 values (16,'Skirt').

接下来,执行下列SQL语句,获得连接1976和对象2.18.12的详细信息。

dmSQL> select HOLD_LK_CONNECTION, LOGIN_IP_ADDR, LOGIN_HOST, LOGIN_TIME, USER_NAME, TABLE_OWNER, TABLE_NAME, LK_NEW_MODE,

LK_GRAN, LK_STATUS, LK_CURRENT_MODE, TIME_OF_SQL_CMD, SQL_CMD from SYSLOCK, SYSTABLE, SYSTEM.SYSUSER where

HOLD_LK_CONNECTION=CONNECTION_ID and TABLE_ID=TABLE_OID and HOLD_LK_CONNECTION=1976 and LK_OBJECT_ID='2.18.12' order by

1, 2, 3, 4, 5, 6;

 

HOLD_LK_CONNECTION LOGIN_IP_ADDR LOGIN_HOST LOGIN_TIME USER_NAME TABLE_OWNER TABLE_NAME LK_NEW_MODE LK_GRAN LK_STATUS LK_CURRENT_MODE TIME_OF_SQL_CMD

SQL_CMD

=======================================================================

1976 127.0.0.1 LA-WANGXUEWEI 2010/06/12 14:59:52 SYSADM SYSADM T1 NONE.NONE TUPLE GRANTED X .NONE 2010/06/12 15:01:00 [EXIT] insert into t1 values(16,'Skirt')

 

 

该例表明:连接1976状态为[EXIT],并向t1中插入值(16,'Skirt')。它表明SQL命令已经被完整地执行,但是没有释放资源。原因可归结为两点:一是程序还没有提交;二是连接断开,导致后面的提交命令执行失败,数据库服务器陷入等待状态。由于以上原因,源2.18.12一直不能被释放。用户可以执行kill connection命令强制释放资源。

总的来说,用户可以通过以上两种方式获得信息,并开始分析锁超时的原因,就知道占有资源的连接的状态是繁忙还是空闲。如果行为正常,完全执行后将释放资源。如果连接处于空闲状态,应该执行“kill connection HOLD_LK_CONNECTION number”释放资源。

2)发生死锁的解决方法

为了提高系统的并发性,减少每个事务的等待时间,可以采取以下方法:

一. 缩短事务长度,使每个事务在尽可能短的时间内完成。

二. 降低锁粒度,并尽可能使用元组锁。(在DBMaster中,默认的锁粒度级别是行)

三. 使用索引扫描。多个事务通过不同索引扫描同一个表可能会提高并发的粒度。

四. 在不影响事务的有效性下,降低事务的隔离级别。(DBMaster默认的最低级别是:只读未提交)

版本:DBMaster 5.x

产品:Normal, Bundle

平台:Windows

 

平均分: 0 (0 投票)

你不能对该内容发表评论

标签