发生锁超时时,如何从错误信息检测数据库状态并找到解决方法?
1) 检查错误信息:
锁超时是正常的数据库行为。多个事务之间存在竞争时,等待时间超过设置的最大时间,则引起锁超时。(默认的等待时间是5秒。可以使用”set lock time n seconds”重新设置间隔时间,也可以在dmconfig.ini中设置关键词DB_LTimO并重启数据库。)
我们可以通过以下方式检测锁状态:
一种方法是在dmconfig.ini中设置关键词
DB_LGDIR=<你的日志路径>
DB_LGLCK=1
DB_LGSVR=3 ;( 1......6)
如果用户想获得更多信息,可以设置DB_LGSVR>3,并在目录中检查有关锁超时的错误信息,锁超时发生时,该目录由DB_LGDIR指定。
查看以下错误信息:
文件“db_name_data.LOG”:
LOG_TIME,BEG_TIME,STATE,RETCODE,EXE_TIME,SV_FUNC,CONNECT_ID,USERNAME,LOGIN_TIME,LOGIN_ADDR,STMT_ID,ERROR_ARG,OTHER_INFO,SQL_CMD
"2010/06/12 15:01:29","2010/06/12 15:01:24","X", 1102,5.01,"EXECUTE ", 3868,"SYSADM","2010/06/12 14:59:56","local",0,": Object ID : 2.18.12 ,Connection ID : 1976, LKMode : X .NONE, LKStatus : GRANTED, Command : [EXIT] insert into t1 values(16,'Skirt') [unknown 1096], 501507, 0, 0","INFO_03868_1","update t1 set c2 = 'Hello' where c1 = 16"
"2010/06/12 15:07:25","2010/06/12 15:07:25","X", 6151,0.00,"PREPARE ", 1976,"SYSADM","2010/06/12 14:59:52","local",0,": 123 [bisort.c 294], 122, 2, 0",,"select * from t1 order by 123"
"2010/06/12 15:07:34","2010/06/12 15:07:34","X", 6151,0.00,"PREPARE ", 1976,"SYSADM","2010/06/12 14:59:52","local",0,": 3 [bisort.c 294], 2, 2, 0",,"select * from t1 order by 1,2,3"
文件File “db_name_data.TXT”:
INFO_03868_1 --------------------
[ERR_LK_TIMEOUT]:: Object ID : 2.18.12 ,Connection ID : 1976, LKMode : X .NONE, LKStatus : GRANTED, Command : [EXIT] insert into t1 values(16,'Skirt')
另一种方法是从SYSLOCK、SYSTABLE、SYSYUSER中搜索锁超时信息。
例如:
dmSQL> update t1 set c2 = 'Hello' where c1 = 16;
ERROR (1102): [Dbmaster] lock time-out: Object ID: 2.18.12, Connection ID: 1976, LKMode: X .NONE, LKStatus: GRANTED, Command: [EXIT] insert into t1 values (16,'Skirt')
锁超时发生时,对象ID = 2.18.12被ID = 1976占用,状态是X(在1976连接被释放前,没有用户可以获得对象:2.18.12),导致后来的连接无法获得正确的数据库信息,发生锁超时。
分析锁超时的原因:
首先,执行下列SQL语句,知道什么连接占用数据库资源。
dmSQL> select HOLD_LK_CONNECTION, LOGIN_IP_ADDR, LOGIN_HOST, LOGIN_TIME, USER_NAME, TABLE_OWNER, TABLE_NAME, LK_NEW_MODE, LK_GRAN, LK_STATUS, LK_CURRENT_MODE, TIME_OF_SQL_CMD, SQL_CMD from SYSTEM.SYSLOCK, SYSTEM.SYSTABLE, SYSTEM.SYSUSER where HOLD_LK_CONNECTION=CONNECTION_ID and TABLE_ID=TABLE_OID and LK_GRAN! ='SYSTEM' and TABLE_NAME not like 'SYS%' order by 1, 2, 3, 4, 5, 6;
HOLD_LK_CONNECTION LOGIN_IP_ADDR LOGIN_HOST LOGIN_TIME USER_NAME TABLE_OWNER TABLE_NAME LK_NEW_MODE LK_GRAN LK_STATUS LK_CURRENT_MODE TIME_OF_SQL_CMD
SQL_CMD
=======================================================================
1976 127.0.0.1 LA-WANGXUEWEI 2010/06/12 14:59:52 SYSADM SYSADM T1 NONE.NONE TABLE GRANTED IX .NONE 2010/06/12 15:01:00 [EXIT] insert into t1 values(16,'Skirt')
1976 127.0.0.1 LA-WANGXUEWEI 2010/06/12 14:59:52 SYSADM SYSADM T1 NONE.NONE TUPLE GRANTED X .NONE 2010/06/12 15:01:00 [EXIT] insert into t1 values(16,'Skirt')
1976 127.0.0.1 LA-WANGXUEWEI 2010/06/12 14:59:52 SYSADM SYSADM T1 NONE.NONE PAGE GRANTED IX .NONE 2010/06/12 15:01:00 [EXIT] insert into t1 values(16,'Skirt')
3868 127.0.0.1 LA-WANGXUEWEI 2010/06/12 14:59:56 SYSADM SYSADM T1 NONE.NONE TABLE GRANTED IX .NONE 2010/06/12 15:01:50 [EXEC] select HOLD_LK_CONNECTION, LOGIN_IP_ADDR, LOGIN_HOST, LOGIN_TIME, USER_NAME, TABLE_OWNER,TABLE_NAME, LK_NEW_MODE, LK_GRAN, LK_STATUS, LK_CURRENT_MODE, TIME_OF_SQL_CMD, SQL_CMD from SYSTEM.SYSLOCK, SYSTEM.SYSTABLE, SYSTEM.SYSUSER where HOLD_LK_CONNECTION=CONNECTION_ID and TABLE_ID=TABLE_OID and LK_GRAN !='SYSTEM' and TABLE_NAME not like 'SYS%' order by 1,2,3,4,5,6
3868 127.0.0.1 LA-WANGXUEWEI 2010/06/12 14:59:56 SYSADM SYSADM T1 NONE.NONE PAGE GRANTED IU .NONE 2010/06/12 15:01:50 [EXEC] select HOLD_LK_CONNECTION, LOGIN_IP_ADDR, LOGIN_HOST, LOGIN_TIME, USER_NAME, TABLE_OWNER,TABLE_NAME, LK_NEW_MODE, LK_GRAN, LK_STATUS, LK_CURRENT_MODE, TIME_OF_SQL_CMD, SQL_CMD from SYSTEM.SYSLOCK, SYSTEM.SYSTABLE, SYSTEM.SYSUSER where HOLD_LK_CONNECTION=CONNECTION_ID and TABLE_ID=TABLE_OID and LK_GRAN !='SYSTEM' and TABLE_NAME not like 'SYS%' order by 1,2,3,4,5,6
We found connection ID 1976 lock the resource by executing command: insert into t1 values (16,'Skirt').
接下来,执行下列SQL语句,获得连接1976和对象2.18.12的详细信息。
dmSQL> select HOLD_LK_CONNECTION, LOGIN_IP_ADDR, LOGIN_HOST, LOGIN_TIME, USER_NAME, TABLE_OWNER, TABLE_NAME, LK_NEW_MODE,
LK_GRAN, LK_STATUS, LK_CURRENT_MODE, TIME_OF_SQL_CMD, SQL_CMD from SYSLOCK, SYSTABLE, SYSTEM.SYSUSER where
HOLD_LK_CONNECTION=CONNECTION_ID and TABLE_ID=TABLE_OID and HOLD_LK_CONNECTION=1976 and LK_OBJECT_ID='2.18.12' order by
1, 2, 3, 4, 5, 6;
HOLD_LK_CONNECTION LOGIN_IP_ADDR LOGIN_HOST LOGIN_TIME USER_NAME TABLE_OWNER TABLE_NAME LK_NEW_MODE LK_GRAN LK_STATUS LK_CURRENT_MODE TIME_OF_SQL_CMD
SQL_CMD
=======================================================================
1976 127.0.0.1 LA-WANGXUEWEI 2010/06/12 14:59:52 SYSADM SYSADM T1 NONE.NONE TUPLE GRANTED X .NONE 2010/06/12 15:01:00 [EXIT] insert into t1 values(16,'Skirt')
该例表明:连接1976状态为[EXIT],并向t1中插入值(16,'Skirt')。它表明SQL命令已经被完整地执行,但是没有释放资源。原因可归结为两点:一是程序还没有提交;二是连接断开,导致后面的提交命令执行失败,数据库服务器陷入等待状态。由于以上原因,源2.18.12一直不能被释放。用户可以执行kill connection命令强制释放资源。
总的来说,用户可以通过以上两种方式获得信息,并开始分析锁超时的原因,就知道占有资源的连接的状态是繁忙还是空闲。如果行为正常,完全执行后将释放资源。如果连接处于空闲状态,应该执行“kill connection HOLD_LK_CONNECTION number”释放资源。
2)发生死锁的解决方法
为了提高系统的并发性,减少每个事务的等待时间,可以采取以下方法:
一. 缩短事务长度,使每个事务在尽可能短的时间内完成。
二. 降低锁粒度,并尽可能使用元组锁。(在DBMaster中,默认的锁粒度级别是行)
三. 使用索引扫描。多个事务通过不同索引扫描同一个表可能会提高并发的粒度。
四. 在不影响事务的有效性下,降低事务的隔离级别。(DBMaster默认的最低级别是:只读未提交)
版本:DBMaster 5.x
产品:Normal, Bundle
平台:Windows