I was curious about Oracle Data Guard over long distance with synchronous redo transfer. In this mode session waits on commit until redo is written on standby site. When standby is far far away, most time is spent on waiting for redo write acknowledgement. I will show it in today's post.
TOC
- Overview
- Initial setup
- Adding network delay with netem
- SYNC redo transfer mode
- Parameter COMMIT_WAIT
- Sources
Initial setup
Environment setup is same as in the previous post.
At first protection mode is set to Maximum Performance
and RTT is not artificially increased:
[oracle@centos-dg1 PROD ~]$ dgmgrl -silent / "show configuration" Configuration - dg Protection Mode: MaxPerformance Databases: prod - Primary database stprod - Physical standby database Fast-Start Failover: DISABLED Configuration Status: SUCCESS [oracle@centos-dg1 PROD ~]$ dgmgrl -silent / "show database prod LogXptMode" LogXptMode = 'ASYNC' [oracle@centos-dg1 PROD ~]$ dgmgrl -silent / "show database stprod LogXptMode" LogXptMode = 'ASYNC' [oracle@centos-dg1 PROD ~]$ dgmgrl -silent / "show database stprod DelayMins" DelayMins = '0' [oracle@centos-dg1 PROD ~]$ dgmgrl -silent / "show database prod DelayMins" DelayMins = '0' [oracle@centos-dg1 PROD ~]$ ping centos-dg2-dg PING centos-dg2-dg (192.168.253.11) 56(84) bytes of data. 64 bytes from centos-dg2-dg (192.168.253.11): icmp_seq=1 ttl=64 time=0.302 ms 64 bytes from centos-dg2-dg (192.168.253.11): icmp_seq=2 ttl=64 time=0.314 ms 64 bytes from centos-dg2-dg (192.168.253.11): icmp_seq=3 ttl=64 time=0.256 ms 64 bytes from centos-dg2-dg (192.168.253.11): icmp_seq=4 ttl=64 time=0.308 ms ^C --- centos-dg2-dg ping statistics --- 4 packets transmitted, 4 received, 0% packet loss, time 3000ms rtt min/avg/max/mdev = 0.256/0.295/0.314/0.022 ms
I will measure how long does it take to insert one row into a table. Sqlplus commits on exit, so it will commit the transaction:
[oracle@centos-dg1 PROD ~]$ time echo "insert into test values(sysdate);" | sqlplus -S -L / as sysdba 1 row created. real 0m0.053s user 0m0.011s sys 0m0.011s
Adding network delay
I'll use network emulation functionality which is part of Linux kernel. With following command I will add 250ms delay (+- 50ms) to device eth1. This device is part of private network dedicated to Data Guard communication.
[root@centos-dg1 PROD ~]$ tc qdisc add dev eth1 root netem delay 250ms 50ms distribution normal
How RTT looks like now:
[oracle@centos-dg1 PROD ~]$ ping centos-dg2-dg PING centos-dg2-dg (192.168.253.11) 56(84) bytes of data. 64 bytes from centos-dg2-dg (192.168.253.11): icmp_seq=1 ttl=64 time=305 ms 64 bytes from centos-dg2-dg (192.168.253.11): icmp_seq=2 ttl=64 time=234 ms 64 bytes from centos-dg2-dg (192.168.253.11): icmp_seq=3 ttl=64 time=227 ms 64 bytes from centos-dg2-dg (192.168.253.11): icmp_seq=4 ttl=64 time=256 ms 64 bytes from centos-dg2-dg (192.168.253.11): icmp_seq=5 ttl=64 time=337 ms 64 bytes from centos-dg2-dg (192.168.253.11): icmp_seq=6 ttl=64 time=318 ms ^C --- centos-dg2-dg ping statistics --- 6 packets transmitted, 6 received, 0% packet loss, time 5000ms rtt min/avg/max/mdev = 227.266/279.950/337.239/42.477 ms
It is working! From other node too:
[oracle@centos-dg2 STPROD ~]$ ping centos-dg1-dg PING centos-dg1-dg (192.168.253.10) 56(84) bytes of data. 64 bytes from centos-dg1-dg (192.168.253.10): icmp_seq=1 ttl=64 time=383 ms 64 bytes from centos-dg1-dg (192.168.253.10): icmp_seq=2 ttl=64 time=305 ms 64 bytes from centos-dg1-dg (192.168.253.10): icmp_seq=3 ttl=64 time=310 ms 64 bytes from centos-dg1-dg (192.168.253.10): icmp_seq=4 ttl=64 time=322 ms 64 bytes from centos-dg1-dg (192.168.253.10): icmp_seq=5 ttl=64 time=232 ms 64 bytes from centos-dg1-dg (192.168.253.10): icmp_seq=6 ttl=64 time=217 ms 64 bytes from centos-dg1-dg (192.168.253.10): icmp_seq=7 ttl=64 time=280 ms ^C --- centos-dg1-dg ping statistics --- 7 packets transmitted, 7 received, 0% packet loss, time 6004ms rtt min/avg/max/mdev = 217.610/293.159/383.449/51.966 ms
And what about host machine:
[hasul@kra ~]$ ping centos-dg1 PING centos-dg1 (192.168.122.175) 56(84) bytes of data. 64 bytes from centos-dg1 (192.168.122.175): icmp_seq=1 ttl=64 time=0.172 ms 64 bytes from centos-dg1 (192.168.122.175): icmp_seq=2 ttl=64 time=0.211 ms ^C --- centos-dg1 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.172/0.191/0.211/0.023 ms [hasul@kra ~]$ ping centos-dg2 PING centos-dg2 (192.168.122.29) 56(84) bytes of data. 64 bytes from centos-dg2 (192.168.122.29): icmp_seq=1 ttl=64 time=0.173 ms 64 bytes from centos-dg2 (192.168.122.29): icmp_seq=2 ttl=64 time=0.131 ms 64 bytes from centos-dg2 (192.168.122.29): icmp_seq=3 ttl=64 time=0.286 ms ^C --- centos-dg2 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 1999ms rtt min/avg/max/mdev = 0.131/0.196/0.286/0.067 ms [hasul@kra ~]$ ping centos-dg2-dg PING centos-dg2-dg (192.168.253.11) 56(84) bytes of data. 64 bytes from centos-dg2-dg (192.168.253.11): icmp_seq=1 ttl=64 time=0.222 ms 64 bytes from centos-dg2-dg (192.168.253.11): icmp_seq=2 ttl=64 time=0.239 ms 64 bytes from centos-dg2-dg (192.168.253.11): icmp_seq=3 ttl=64 time=0.289 ms ^C --- centos-dg2-dg ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.222/0.250/0.289/0.028 ms [hasul@kra ~]$ ping centos-dg1-dg PING centos-dg1-dg (192.168.253.10) 56(84) bytes of data. 64 bytes from centos-dg1-dg (192.168.253.10): icmp_seq=1 ttl=64 time=247 ms 64 bytes from centos-dg1-dg (192.168.253.10): icmp_seq=2 ttl=64 time=296 ms 64 bytes from centos-dg1-dg (192.168.253.10): icmp_seq=3 ttl=64 time=241 ms ^C --- centos-dg1-dg ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 241.131/261.660/296.060/24.482 ms
Only private interface on the first guest is affected as expected.
SYNC redo transfer mode
Let's see if this change affects transaction time:
[oracle@centos-dg1 PROD ~]$ time echo "insert into test values(sysdate);" | sqlplus -S -L / as sysdba 1 row created. real 0m0.042s user 0m0.008s sys 0m0.014s
No significant change because synchronous transfer is not enabled yet. Let's fix it:
[oracle@centos-dg1 PROD ~]$ dgmgrl -silent / "edit database prod set property LogXptMode = 'SYNC'" Property "logxptmode" updated [oracle@centos-dg1 PROD ~]$ dgmgrl -silent / "edit database stprod set property LogXptMode = 'SYNC'" Property "logxptmode" updated [oracle@centos-dg1 PROD ~]$ dgmgrl -silent / "edit configuration set protection mode as 'MaxAvailability'"
Hopefully it will work:
[oracle@centos-dg1 PROD ~]$ time echo "insert into test values(sysdate);" | sqlplus -S -L / as sysdba 1 row created. real 0m0.446s user 0m0.014s sys 0m0.012s
Yes, there is big difference - 0.04s vs 0.4s. But first time was recorded when Data Guard was running in Maximum Performance
mode.
I'll remove delay to get more relevant result:
[root@centos-dg1 PROD ~]$ tc qdisc del dev eth1 root netem
RTT got back to normal:
[oracle@centos-dg2 STPROD ~]$ ping centos-dg1-dg PING centos-dg1-dg (192.168.253.10) 56(84) bytes of data. 64 bytes from centos-dg1-dg (192.168.253.10): icmp_seq=1 ttl=64 time=0.231 ms 64 bytes from centos-dg1-dg (192.168.253.10): icmp_seq=2 ttl=64 time=0.258 ms ^C --- centos-dg1-dg ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.231/0.244/0.258/0.020 ms
And transaction duration is:
[oracle@centos-dg1 PROD ~]$ time echo "insert into test values(sysdate);" | sqlplus -S -L / as sysdba 1 row created. real 0m0.048s user 0m0.009s sys 0m0.010s
Result is pretty same as with ASYNC mode.
Parameter COMMIT_WAIT
This parameter controls whether commiting session waits on writing redo to disk (WAIT) or not (NOWAIT). I'm curious if it can impact transaction time. I'll set parameter to NOWAIT a run the transaction:
[oracle@centos-dg1 PROD ~]$ echo "alter system set commit_wait = 'NOWAIT';" | sqlplus -S -L / as sysdba System altered. [oracle@centos-dg1 PROD ~]$ time echo "insert into test values(sysdate);" | sqlplus -S -L / as sysdba 1 row created. real 0m0.046s user 0m0.011s sys 0m0.011s
No difference. I will enable network delay for the last time:
[oracle@centos-dg1 PROD ~]$ sudo tc qdisc add dev eth1 root netem delay 250ms 50ms distribution normal [oracle@centos-dg1 PROD ~]$ time echo "insert into test values(sysdate);" | sqlplus -S -L / as sysdba 1 row created. real 0m0.326s user 0m0.010s sys 0m0.008s
It's better than without commit_wait=NOWAIT
but I think it is only by accident.
I should have tried multiple transactions and/or attemps and calculated average values.
It appears that commit_wait
parameter only controls flushing redo to local redo logs but not to standby redo logs.
No comments:
Post a Comment