久久精品在线,日韩精品二区,欧美日韩精品在线免费观看

物理復(fù)制也叫流復(fù)制，流復(fù)制的原理是主庫把WAL發(fā)送給備庫，備庫接收WAL后，進(jìn)行重放。

邏輯復(fù)制的原理：

邏輯復(fù)制也是基于WAL文件，在邏輯復(fù)制中把主庫稱為源端庫，備庫稱為目標(biāo)端數(shù)據(jù)庫，源端數(shù)據(jù)庫根據(jù)預(yù)先指定好的邏輯解析規(guī)則對WAL文件進(jìn)行解析，把DML操作解析成一定的邏輯變化信息（標(biāo)準(zhǔn)SQL語句），源端數(shù)據(jù)庫把標(biāo)準(zhǔn)SQL語句發(fā)給目標(biāo)端數(shù)據(jù)庫，目標(biāo)端數(shù)據(jù)庫接收到之后進(jìn)行應(yīng)用，從而實(shí)現(xiàn)數(shù)據(jù)同步。

流復(fù)制和邏輯復(fù)制的區(qū)別：

流復(fù)制主庫上的事務(wù)提交不需要等待備庫接收到WAL文件后的確認(rèn)，邏輯復(fù)制相反。

流復(fù)制要求主備庫的大版本一致，邏輯復(fù)制可以跨大版本的數(shù)據(jù)同步，也可以實(shí)現(xiàn)異構(gòu)數(shù)據(jù)庫的數(shù)據(jù)同步。

流復(fù)制的主庫可讀寫，從庫只允許讀，邏輯復(fù)制的目標(biāo)端數(shù)據(jù)庫要求可讀寫

流復(fù)制是對實(shí)例級別的復(fù)制（整個postgresql數(shù)據(jù)庫），邏輯復(fù)制是選擇性的復(fù)制一些表，所以是對表級別的復(fù)制。

流復(fù)制有主庫的DDL、DML操作，邏輯復(fù)制只有DML操作。

補(bǔ)充：PostgreSQL 同步流復(fù)制原理和代碼淺析

背景

數(shù)據(jù)庫ACID中的持久化如何實(shí)現(xiàn)

數(shù)據(jù)庫ACID里面的D，持久化。指的是對于用戶來說提交的事務(wù)，數(shù)據(jù)是可靠的，即使數(shù)據(jù)庫crash了，在硬件完好的情況下，也能恢復(fù)回來。

PostgreSQL是怎么做到的呢，看一幅圖，畫得比較丑，湊合看吧。

假設(shè)一個事務(wù)，對數(shù)據(jù)庫做了一些操作，并且產(chǎn)生了一些臟數(shù)據(jù)，首先這些臟數(shù)據(jù)會在數(shù)據(jù)庫的shared buffer中。

同時，產(chǎn)生這些臟數(shù)據(jù)的同時也會產(chǎn)生對應(yīng)的redo信息，產(chǎn)生的REDO會有對應(yīng)的LSN號（你可以理解為REDO 的虛擬地址空間的一個唯一的OFFSET，每一筆REDO都有），這個LSN號也會記錄到shared buffer中對應(yīng)的臟頁中。

walwriter是負(fù)責(zé)將wal buffer flush到持久化設(shè)備的進(jìn)程，同時它會更新一個全局變量，記錄已經(jīng)flush的最大的LSN號。

bgwriter是負(fù)責(zé)將shared buffer的臟頁持久化到持久化設(shè)備的進(jìn)程，它在flush時，除了要遵循LRU算法之外，還要通過LSN全局變量的比對，來保證臟頁對應(yīng)的REDO記錄已經(jīng)flush到持久化設(shè)備了，如果發(fā)現(xiàn)還對應(yīng)的REDO沒有持久化，會觸發(fā)WAL writer去flush wal buffer。 (即確保日志比臟數(shù)據(jù)先落盤)

當(dāng)用戶提交事務(wù)時，也會產(chǎn)生一筆提交事務(wù)的REDO，這筆REDO也攜帶了LSN號。backend process 同樣需要等待對應(yīng)LSN flush到磁盤后才會返回給用戶提交成功的信號。(保證日志先落盤，然后返回給用戶)

數(shù)據(jù)庫同步復(fù)制原理淺析

同步流復(fù)制，即保證standby節(jié)點(diǎn)和本地節(jié)點(diǎn)的日志雙雙落盤。

PostgreSQL使用另一組全局變量，記錄同步流復(fù)制節(jié)點(diǎn)已經(jīng)接收到的XLOG LSN，以及已經(jīng)持久化的XLOG LSN。

用戶在發(fā)起提交請求后，backend process除了要判斷本地wal有沒有持久化，同時還需要判斷同步流復(fù)制節(jié)點(diǎn)的XLOG有沒有接收到或持久化（通過synchronous_commit參數(shù)控制）。

如果同步流復(fù)制節(jié)點(diǎn)的XLOG還沒有接收或持久化，backend process會進(jìn)入等待狀態(tài)。

數(shù)據(jù)庫同步復(fù)制代碼淺析

對應(yīng)的代碼和解釋如下：

1 2	`CommitTransaction @ src/backend/access/transam/xact.c` `RecordTransactionCommit @ src/backend/access/transam/xact.c`

									/* 

									  * If we didn't create XLOG entries, we're done here; otherwise we 

									  * should trigger flushing those entries the same as a commit record 

									  * would. This will primarily happen for HOT pruning and the like; we 

									  * want these to be flushed to disk in due time. 

									  */ 

									 if (!wrote_xlog) // 沒有產(chǎn)生redo的事務(wù)，直接返回 

									  goto cleanup; 

									if (wrote_xlog && markXidCommitted) // 如果產(chǎn)生了redo, 等待同步流復(fù)制 

									 SyncRepWaitForLSN(XactLastRecEnd);

SyncRepWaitForLSN @ src/backend/replication/syncrep.c

									/* 

									 * Wait for synchronous replication, if requested by user. 

									 * 

									 * Initially backends start in state SYNC_REP_NOT_WAITING and then

									 * change that state to SYNC_REP_WAITING before adding ourselves 

									 * to the wait queue. During SyncRepWakeQueue() a WALSender changes 

									 * the state to SYNC_REP_WAIT_COMPLETE once replication is confirmed. 

									 * This backend then resets its state to SYNC_REP_NOT_WAITING. 

									 */ 

									void 

									SyncRepWaitForLSN(XLogRecPtr XactCommitLSN) 

									{ 

									... 

									 /* 

									  * Fast exit if user has not requested sync replication, or there are no

									  * sync replication standby names defined. Note that those standbys don't 

									  * need to be connected. 

									  */ 

									 if (!SyncRepRequested() || !SyncStandbysDefined()) // 如果不是同步事務(wù)或者沒有定義同步流復(fù)制節(jié)點(diǎn)，直接返回 

									  return; 

									... 

									 /* 

									  * We don't wait for sync rep if WalSndCtl->sync_standbys_defined is not

									  * set. See SyncRepUpdateSyncStandbysDefined. 

									  * 

									  * Also check that the standby hasn't already replied. Unlikely race 

									  * condition but we'll be fetching that cache line anyway so it's likely 

									  * to be a low cost check. 

									  */ 

									 if (!WalSndCtl->sync_standbys_defined ||  

									  XactCommitLSN <= WalSndCtl->lsn[mode]) // 如果沒有定義同步流復(fù)制節(jié)點(diǎn)，或者判斷到commit lsn小于已同步的LSN，說明XLOG已經(jīng)flush了，直接返回。 

									 { 

									  LWLockRelease(SyncRepLock); 

									  return; 

									 } 

									... 

									// 進(jìn)入循環(huán)等待狀態(tài)，說明本地的xlog已經(jīng)flush了，只是等待同步流復(fù)制節(jié)點(diǎn)的REDO同步狀態(tài)。 

									 /* 

									  * Wait for specified LSN to be confirmed. 

									  * 

									  * Each proc has its own wait latch, so we perform a normal latch 

									  * check/wait loop here. 

									  */ 

									 for (;;) // 進(jìn)入等待狀態(tài)，檢查latch是否滿足釋放等待的條件（wal sender會根據(jù)REDO的同步情況，實(shí)時更新對應(yīng)的latch） 

									 { 

									  int   syncRepState; 

									  /* Must reset the latch before testing state. */ 

									  ResetLatch(&MyProc->procLatch); 

									  syncRepState = MyProc->syncRepState; 

									  if (syncRepState == SYNC_REP_WAITING) 

									  { 

									   LWLockAcquire(SyncRepLock, LW_SHARED); 

									   syncRepState = MyProc->syncRepState; 

									   LWLockRelease(SyncRepLock); 

									  } 

									  if (syncRepState == SYNC_REP_WAIT_COMPLETE) // 說明XLOG同步完成，退出等待 

									   break; 

									// 如果本地進(jìn)程掛了，輸出的消息內(nèi)容是，本地事務(wù)信息已持久化，但是遠(yuǎn)程也許還沒有持久化 

									  if (ProcDiePending) 

									  { 

									   ereport(WARNING, 

									     (errcode(ERRCODE_ADMIN_SHUTDOWN), 

									      errmsg("canceling the wait for synchronous replication and terminating connection due to administrator command"), 

									      errdetail("The transaction has already committed locally, but might not have been replicated to the standby."))); 

									   whereToSendOutput = DestNone; 

									   SyncRepCancelWait(); 

									   break; 

									  } 

									// 如果用戶主動cancel query，輸出的消息內(nèi)容是，本地事務(wù)信息已持久化，但是遠(yuǎn)程也許還沒有持久化 

									  if (QueryCancelPending) 

									  { 

									   QueryCancelPending = false; 

									   ereport(WARNING, 

									     (errmsg("canceling wait for synchronous replication due to user request"), 

									      errdetail("The transaction has already committed locally, but might not have been replicated to the standby."))); 

									   SyncRepCancelWait(); 

									   break; 

									  } 

									// 如果postgres主進(jìn)程掛了，進(jìn)入退出流程。 

									  if (!PostmasterIsAlive()) 

									  { 

									   ProcDiePending = true; 

									   whereToSendOutput = DestNone; 

									   SyncRepCancelWait(); 

									   break; 

									  } 

									// 等待wal sender來修改對應(yīng)的latch 

									  /* 

									   * Wait on latch. Any condition that should wake us up will set the 

									   * latch, so no need for timeout. 

									   */ 

									  WaitLatch(&MyProc->procLatch, WL_LATCH_SET | WL_POSTMASTER_DEATH, -1);

注意用戶進(jìn)入等待狀態(tài)后，只有主動cancel , 或者kill(terminate) , 或者主進(jìn)程die才能退出無限的等待狀態(tài)。后面會講到如何將同步級別降級為異步。

前面提到了，用戶端需要等待LATCH的釋放信號。

那么誰來給它這個信號了，是wal sender進(jìn)程，源碼和解釋如下 :

src/backend/replication/walsender.c

									StartReplication 

									WalSndLoop 

									ProcessRepliesIfAny 

									ProcessStandbyMessage 

									ProcessStandbyReplyMessage 

									 if (!am_cascading_walsender) // 非級聯(lián)流復(fù)制節(jié)點(diǎn)，那么它將調(diào)用SyncRepReleaseWaiters修改backend process等待隊列中它們對應(yīng)的 latch。  

									  SyncRepReleaseWaiters(); 

									SyncRepReleaseWaiters @ src/backend/replication/syncrep.c 

									/* 

									 * Update the LSNs on each queue based upon our latest state. This 

									 * implements a simple policy of first-valid-standby-releases-waiter. 

									 * 

									 * Other policies are possible, which would change what we do here and what 

									 * perhaps also which information we store as well. 

									 */ 

									void 

									SyncRepReleaseWaiters(void) 

									{ 

									... 

									  // 釋放滿足條件的等待隊列 

									 /* 

									  * Set the lsn first so that when we wake backends they will release up to

									  * this location. 

									  */ 

									 if (walsndctl->lsn[SYNC_REP_WAIT_WRITE] < MyWalSnd->write) 

									 { 

									  walsndctl->lsn[SYNC_REP_WAIT_WRITE] = MyWalSnd->write; 

									  numwrite = SyncRepWakeQueue(false, SYNC_REP_WAIT_WRITE); 

									 } 

									 if (walsndctl->lsn[SYNC_REP_WAIT_FLUSH] < MyWalSnd->flush) 

									 { 

									  walsndctl->lsn[SYNC_REP_WAIT_FLUSH] = MyWalSnd->flush; 

									  numflush = SyncRepWakeQueue(false, SYNC_REP_WAIT_FLUSH); 

									 } 

									...

SyncRepWakeQueue @ src/backend/replication/syncrep.c

									/* 

									 * Walk the specified queue from head. Set the state of any backends that 

									 * need to be woken, remove them from the queue, and then wake them. 

									 * Pass all = true to wake whole queue; otherwise, just wake up to

									 * the walsender's LSN. 

									 * 

									 * Must hold SyncRepLock. 

									 */ 

									static int

									SyncRepWakeQueue(bool all, int mode) 

									{ 

									... 

									 while (proc) // 修改對應(yīng)的backend process 的latch 

									 { 

									  /* 

									   * Assume the queue is ordered by LSN 

									   */ 

									  if (!all && walsndctl->lsn[mode] < proc->waitLSN) 

									   return numprocs; 

									  /* 

									   * Move to next proc, so we can delete thisproc from the queue. 

									   * thisproc is valid, proc may be NULL after this. 

									   */ 

									  thisproc = proc; 

									  proc = (PGPROC *) SHMQueueNext(&(WalSndCtl->SyncRepQueue[mode]), 

									          &(proc->syncRepLinks), 

									          offsetof(PGPROC, syncRepLinks)); 

									  /* 

									   * Set state to complete; see SyncRepWaitForLSN() for discussion of

									   * the various states. 

									   */ 

									  thisproc->syncRepState = SYNC_REP_WAIT_COMPLETE; // 滿足條件時，改成SYNC_REP_WAIT_COMPLETE 

									....

如何設(shè)置事務(wù)可靠性級別

PostgreSQL 支持在會話中設(shè)置事務(wù)的可靠性級別。

off 表示commit 時不需要等待wal 持久化。

local 表示commit 是只需要等待本地數(shù)據(jù)庫的wal 持久化。

remote_write 表示commit 需要等待本地數(shù)據(jù)庫的wal 持久化，同時需要等待sync standby節(jié)點(diǎn)wal write buffer完成(不需要持久化)。

on 表示commit 需要等待本地數(shù)據(jù)庫的wal 持久化，同時需要等待sync standby節(jié)點(diǎn)wal持久化。

提醒一點(diǎn)， synchronous_commit 的任何一種設(shè)置，都不影響wal日志持久化必須先于shared buffer臟數(shù)據(jù)持久化。所以不管你怎么設(shè)置，都不好影響數(shù)據(jù)的一致性。

1 2	`synchronous_commit =` `off` `# synchronization` `level;` `#` `off,` `local, remote_write,` `or` `on`

如何實(shí)現(xiàn)同步復(fù)制降級

從前面的代碼解析可以得知，如果 backend process 進(jìn)入了等待循環(huán)，只接受幾種信號降級。并且降級后會告警，表示本地wal已持久化，但是sync standby節(jié)點(diǎn)不確定wal有沒有持久化。

如果你只配置了1個standby，并且將它配置為同步流復(fù)制節(jié)點(diǎn)。一旦出現(xiàn)網(wǎng)絡(luò)抖動，或者sync standby節(jié)點(diǎn)故障，將導(dǎo)致同步事務(wù)進(jìn)入等待狀態(tài)。

怎么降級呢？

方法1.

修改配置文件并重置

									$ vi postgresql.conf 

									synchronous_commit = local

									$ pg_ctl reload

然后cancel 所有query .

1	`postgres=#` `select` `pg_cancel_backend(pid)` `from` `pg_stat_activity` `where` `pid<>pg_backend_pid();`

收到這樣的信號，表示事務(wù)成功提交，同時表示W(wǎng)AL不知道有沒有同步到sync standby。

									WARNING: canceling wait for synchronous replication due to user request 

									DETAIL: The transaction has already committed locally, but might not have been replicated to the standby. 

									COMMIT

									postgres=# show synchronous_commit ; 

									 synchronous_commit 

									-------------------- 

									 off

									(1 row)

同時它會讀到全局變量synchronous_commit 已經(jīng)是 local了。

這樣就完成了降級的動作。

方法2.

方法1的降級需要對已有的正在等待wal sync的pid使用cancel進(jìn)行處理，有點(diǎn)不人性化。

可以通過修改代碼的方式，做到更人性化。

SyncRepWaitForLSN for循環(huán)中，加一個判斷，如果發(fā)現(xiàn)全局變量sync commit變成local, off了，則告警并退出。這樣就不需要人為的去cancel query了.

WARNING: canceling wait for synchronous replication due to user request

DETAIL: The transaction has already committed locally, but might not have been replicated to the standby.

以上為個人經(jīng)驗(yàn)，希望能給大家一個參考，也希望大家多多支持服務(wù)器之家。如有錯誤或未考慮完全的地方，望不吝賜教。

原文鏈接：https://blog.csdn.net/weixin_42009082/article/details/96481014