日韩毛片中文字幕,国产精品欧美一区喷水,精品美女在线观看

一、背景

Disruptor是英國(guó)外匯交易公司LMAX開發(fā)的一個(gè)高性能隊(duì)列，研發(fā)的初衷是解決內(nèi)存隊(duì)列的延遲問題（在性能測(cè)試中發(fā)現(xiàn)竟然與I/O操作處于同樣的數(shù)量級(jí)）。基于Disruptor開發(fā)的系統(tǒng)單線程能支撐每秒600萬(wàn)訂單，2010年在QCon演講后，獲得了業(yè)界關(guān)注。2011年，企業(yè)應(yīng)用軟件專家Martin Fowler專門撰寫長(zhǎng)文介紹。同年它還獲得了Oracle官方的Duke大獎(jiǎng)。

目前，包括Apache Storm、Camel、Log4j 2在內(nèi)的很多知名項(xiàng)目都應(yīng)用了Disruptor以獲取高性能。在美團(tuán)技術(shù)團(tuán)隊(duì)它也有不少應(yīng)用，有的項(xiàng)目架構(gòu)借鑒了它的設(shè)計(jì)機(jī)制。本文從實(shí)戰(zhàn)角度剖析了Disruptor的實(shí)現(xiàn)原理。

需要特別指出的是，這里所說的隊(duì)列是系統(tǒng)內(nèi)部的內(nèi)存隊(duì)列，而不是Kafka這樣的分布式隊(duì)列。另外，本文所描述的Disruptor特性限于3.3.4。

二、Java內(nèi)置隊(duì)列

介紹Disruptor之前，我們先來看一看常用的線程安全的內(nèi)置隊(duì)列有什么問題。Java的內(nèi)置隊(duì)列如下表所示。

隊(duì)列	有界性	鎖	數(shù)據(jù)結(jié)構(gòu)
ArrayBlockingQueue	bounded	加鎖	arraylist
LinkedBlockingQueue	optionally-bounded	加鎖	linkedlist
ConcurrentLinkedQueue	unbounded	無鎖	linkedlist
LinkedTransferQueue	unbounded	無鎖	linkedlist
PriorityBlockingQueue	unbounded	加鎖	heap
DelayQueue	unbounded	加鎖	heap

隊(duì)列的底層一般分成三種：數(shù)組、鏈表和堆。其中，堆一般情況下是為了實(shí)現(xiàn)帶有優(yōu)先級(jí)特性的隊(duì)列，暫且不考慮。

我們就從數(shù)組和鏈表兩種數(shù)據(jù)結(jié)構(gòu)來看，基于數(shù)組線程安全的隊(duì)列，比較典型的是ArrayBlockingQueue，它主要通過加鎖的方式來保證線程安全；基于鏈表的線程安全隊(duì)列分成LinkedBlockingQueue和ConcurrentLinkedQueue兩大類，前者也通過鎖的方式來實(shí)現(xiàn)線程安全，而后者以及上面表格中的LinkedTransferQueue都是通過原子變量compare and swap（以下簡(jiǎn)稱“CAS”）這種不加鎖的方式來實(shí)現(xiàn)的。

通過不加鎖的方式實(shí)現(xiàn)的隊(duì)列都是無界的（無法保證隊(duì)列的長(zhǎng)度在確定的范圍內(nèi)）；而加鎖的方式，可以實(shí)現(xiàn)有界隊(duì)列。在穩(wěn)定性要求特別高的系統(tǒng)中，為了防止生產(chǎn)者速度過快，導(dǎo)致內(nèi)存溢出，只能選擇有界隊(duì)列；同時(shí)，為了減少Java的垃圾回收對(duì)系統(tǒng)性能的影響，會(huì)盡量選擇array/heap格式的數(shù)據(jù)結(jié)構(gòu)。這樣篩選下來，符合條件的隊(duì)列就只有ArrayBlockingQueue。

三、ArrayBlockingQueue的問題

ArrayBlockingQueue在實(shí)際使用過程中，會(huì)因?yàn)榧渔i和偽共享等出現(xiàn)嚴(yán)重的性能問題，我們下面來分析一下。

1.加鎖

現(xiàn)實(shí)編程過程中，加鎖通常會(huì)嚴(yán)重地影響性能。線程會(huì)因?yàn)楦?jìng)爭(zhēng)不到鎖而被掛起，等鎖被釋放的時(shí)候，線程又會(huì)被恢復(fù)，這個(gè)過程中存在著很大的開銷，并且通常會(huì)有較長(zhǎng)時(shí)間的中斷，因?yàn)楫?dāng)一個(gè)線程正在等待鎖時(shí)，它不能做任何其他事情。如果一個(gè)線程在持有鎖的情況下被延遲執(zhí)行，例如發(fā)生了缺頁(yè)錯(cuò)誤、調(diào)度延遲或者其它類似情況，那么所有需要這個(gè)鎖的線程都無法執(zhí)行下去。如果被阻塞線程的優(yōu)先級(jí)較高，而持有鎖的線程優(yōu)先級(jí)較低，就會(huì)發(fā)生優(yōu)先級(jí)反轉(zhuǎn)。

Disruptor論文中講述了一個(gè)實(shí)驗(yàn)：

這個(gè)測(cè)試程序調(diào)用了一個(gè)函數(shù)，該函數(shù)會(huì)對(duì)一個(gè)64位的計(jì)數(shù)器循環(huán)自增5億次。
機(jī)器環(huán)境：2.4G 6核
運(yùn)算： 64位的計(jì)數(shù)器累加5億次

|Method | Time (ms) | |— | —|

|Single thread | 300|

|Single thread with CAS | 5,700|

|Single thread with lock | 10,000|

|Single thread with volatile write | 4,700|

|Two threads with CAS | 30,000|

|Two threads with lock | 224,000|

CAS操作比單線程無鎖慢了1個(gè)數(shù)量級(jí)；有鎖且多線程并發(fā)的情況下，速度比單線程無鎖慢3個(gè)數(shù)量級(jí)。可見無鎖速度最快。

單線程情況下，不加鎖的性能 > CAS操作的性能 > 加鎖的性能。

在多線程情況下，為了保證線程安全，必須使用CAS或鎖，這種情況下，CAS的性能超過鎖的性能，前者大約是后者的8倍。

綜上可知，加鎖的性能是最差的。

a.關(guān)于鎖和CAS

保證線程安全一般分成兩種方式：鎖和原子變量。

b.鎖

從實(shí)戰(zhàn)角度詳解Disruptor高性能隊(duì)列

圖1 通過加鎖的方式實(shí)現(xiàn)線程安全

采取加鎖的方式，默認(rèn)線程會(huì)沖突，訪問數(shù)據(jù)時(shí)，先加上鎖再訪問，訪問之后再解鎖。通過鎖界定一個(gè)臨界區(qū)，同時(shí)只有一個(gè)線程進(jìn)入。如上圖所示，Thread2訪問Entry的時(shí)候，加了鎖，Thread1就不能再執(zhí)行訪問Entry的代碼，從而保證線程安全。

下面是ArrayBlockingQueue通過加鎖的方式實(shí)現(xiàn)的offer方法，保證線程安全。

public boolean offer(E e) {
checkNotNull(e);
final ReentrantLock lock = this.lock;
lock.lock();
try {
if (count == items.length)
return false;
else {
insert(e);
return true;
}
} finally {
lock.unlock();
}
}

c.原子變量

原子變量能夠保證原子性的操作，意思是某個(gè)任務(wù)在執(zhí)行過程中，要么全部成功，要么全部失敗回滾，恢復(fù)到執(zhí)行之前的初態(tài)，不存在初態(tài)和成功之間的中間狀態(tài)。例如CAS操作，要么比較并交換成功，要么比較并交換失敗。由CPU保證原子性。

通過原子變量可以實(shí)現(xiàn)線程安全。執(zhí)行某個(gè)任務(wù)的時(shí)候，先假定不會(huì)有沖突，若不發(fā)生沖突，則直接執(zhí)行成功；當(dāng)發(fā)生沖突的時(shí)候，則執(zhí)行失敗，回滾再重新操作，直到不發(fā)生沖突。

從實(shí)戰(zhàn)角度詳解Disruptor高性能隊(duì)列

圖2 通過原子變量CAS實(shí)現(xiàn)線程安全

如圖所示，Thread1和Thread2都要把Entry加1。若不加鎖，也不使用CAS，有可能Thread1取到了myValue=1，Thread2也取到了myValue=1，然后相加，Entry中的value值為2。這與預(yù)期不相符，我們預(yù)期的是Entry的值經(jīng)過兩次相加后等于3。

CAS會(huì)先把Entry現(xiàn)在的value跟線程當(dāng)初讀出的值相比較，若相同，則賦值；若不相同，則賦值執(zhí)行失敗。一般會(huì)通過while/for循環(huán)來重新執(zhí)行，直到賦值成功。

代碼示例是AtomicInteger的getAndAdd方法。CAS是CPU的一個(gè)指令，由CPU保證原子性。

/**
* Atomically adds the given value to the current value.
*
* @param delta the value to add
* @return the previous value
*/
public final int getAndAdd(int delta) {
for (;;) {
int current = get();
int next = current + delta;
if (compareAndSet(current, next))
return current;
}
}
/**
* Atomically sets the value to the given updated value
* if the current value {@code ==} the expected value.
*
* @param expect the expected value
* @param update the new value
* @return true if successful. False return indicates that
* the actual value was not equal to the expected value.
*/
public final boolean compareAndSet(int expect, int update) {
return unsafe.compareAndSwapInt(this, valueOffset, expect, update);
}

在高度競(jìng)爭(zhēng)的情況下，鎖的性能將超過原子變量的性能，但是更真實(shí)的競(jìng)爭(zhēng)情況下，原子變量的性能將超過鎖的性能。同時(shí)原子變量不會(huì)有死鎖等活躍性問題。

2.偽共享

a.什么是共享

下圖是計(jì)算的基本結(jié)構(gòu)。L1、L2、L3分別表示一級(jí)緩存、二級(jí)緩存、三級(jí)緩存，越靠近CPU的緩存，速度越快，容量也越小。所以L1緩存很小但很快，并且緊靠著在使用它的CPU內(nèi)核；L2大一些，也慢一些，并且仍然只能被一個(gè)單獨(dú)的CPU核使用；L3更大、更慢，并且被單個(gè)插槽上的所有CPU核共享；最后是主存，由全部插槽上的所有CPU核共享。

從實(shí)戰(zhàn)角度詳解Disruptor高性能隊(duì)列

圖3 計(jì)算機(jī)CPU與緩存示意圖

當(dāng)CPU執(zhí)行運(yùn)算的時(shí)候，它先去L1查找所需的數(shù)據(jù)、再去L2、然后是L3，如果最后這些緩存中都沒有，所需的數(shù)據(jù)就要去主內(nèi)存拿。走得越遠(yuǎn)，運(yùn)算耗費(fèi)的時(shí)間就越長(zhǎng)。所以如果你在做一些很頻繁的事，你要盡量確保數(shù)據(jù)在L1緩存中。

另外，線程之間共享一份數(shù)據(jù)的時(shí)候，需要一個(gè)線程把數(shù)據(jù)寫回主存，而另一個(gè)線程訪問主存中相應(yīng)的數(shù)據(jù)。

下面是從CPU訪問不同層級(jí)數(shù)據(jù)的時(shí)間概念:

從CPU到	大約需要的CPU周期	大約需要的時(shí)間
主存	-	約60-80ns
QPI 總線傳輸(between sockets, not drawn)	-	約20ns
L3 cache	約40-45 cycles	約15ns
L2 cache	約10 cycles	約3ns
L1 cache	約3-4 cycles	約1ns
寄存器	1 cycle	-

可見CPU讀取主存中的數(shù)據(jù)會(huì)比從L1中讀取慢了近2個(gè)數(shù)量級(jí)。

b.緩存行

Cache是由很多個(gè)cache line組成的。每個(gè)cache line通常是64字節(jié)，并且它有效地引用主內(nèi)存中的一塊兒地址。一個(gè)Java的long類型變量是8字節(jié)，因此在一個(gè)緩存行中可以存8個(gè)long類型的變量。

CPU每次從主存中拉取數(shù)據(jù)時(shí)，會(huì)把相鄰的數(shù)據(jù)也存入同一個(gè)cache line。

在訪問一個(gè)long數(shù)組的時(shí)候，如果數(shù)組中的一個(gè)值被加載到緩存中，它會(huì)自動(dòng)加載另外7個(gè)。因此你能非常快的遍歷這個(gè)數(shù)組。事實(shí)上，你可以非常快速的遍歷在連續(xù)內(nèi)存塊中分配的任意數(shù)據(jù)結(jié)構(gòu)。

下面的例子是測(cè)試?yán)胏ache line的特性和不利用cache line的特性的效果對(duì)比。

package com.meituan.FalseSharing;
/**
* @author gongming
* @description
* @date 16/6/4
*/
public class CacheLineEffect {
//考慮一般緩存行大小是64字節(jié)，一個(gè) long 類型占8字節(jié)
static long[][] arr;
public static void main(String[] args) {
arr = new long[1024 * 1024][];
for (int i = 0; i < 1024 * 1024; i++) {
arr[i] = new long[8];
for (int j = 0; j < 8; j++) {
arr[i][j] = 0L;
}
}
long sum = 0L;
long marked = System.currentTimeMillis();
for (int i = 0; i < 1024 * 1024; i+=1) {
for(int j =0; j< 8;j++){
sum = arr[i][j];
}
}
System.out.println("Loop times:" + (System.currentTimeMillis() - marked) + "ms");
marked = System.currentTimeMillis();
for (int i = 0; i < 8; i+=1) {
for(int j =0; j< 1024 * 1024;j++){
sum = arr[j][i];
}
}
System.out.println("Loop times:" + (System.currentTimeMillis() - marked) + "ms");
}
}

在2G Hz、2核、8G內(nèi)存的運(yùn)行環(huán)境中測(cè)試，速度差一倍。

結(jié)果：

Loop times:30ms Loop times:65ms

c.什么是偽共享

ArrayBlockingQueue有三個(gè)成員變量： - takeIndex：需要被取走的元素下標(biāo) - putIndex：可被元素插入的位置的下標(biāo) - count：隊(duì)列中元素的數(shù)量

這三個(gè)變量很容易放到一個(gè)緩存行中，但是之間修改沒有太多的關(guān)聯(lián)。所以每次修改，都會(huì)使之前緩存的數(shù)據(jù)失效，從而不能完全達(dá)到共享的效果。

從實(shí)戰(zhàn)角度詳解Disruptor高性能隊(duì)列

圖4 ArrayBlockingQueue偽共享示意圖

如上圖所示，當(dāng)生產(chǎn)者線程put一個(gè)元素到ArrayBlockingQueue時(shí)，putIndex會(huì)修改，從而導(dǎo)致消費(fèi)者線程的緩存中的緩存行無效，需要從主存中重新讀取。

這種無法充分使用緩存行特性的現(xiàn)象，稱為偽共享。

對(duì)于偽共享，一般的解決方案是，增大數(shù)組元素的間隔使得由不同線程存取的元素位于不同的緩存行上，以空間換時(shí)間。

package com.meituan.FalseSharing;
public class FalseSharing implements Runnable{
public final static long ITERATIONS = 500L * 1000L * 100L;
private int arrayIndex = 0;
private static ValuePadding[] longs;
public FalseSharing(final int arrayIndex) {
this.arrayIndex = arrayIndex;
}
public static void main(final String[] args) throws Exception {
for(int i=1;i<10;i++){
System.gc();
final long start = System.currentTimeMillis();
runTest(i);
System.out.println("Thread num "+i+" duration = " + (System.currentTimeMillis() - start));
}
}
private static void runTest(int NUM_THREADS) throws InterruptedException {
Thread[] threads = new Thread[NUM_THREADS];
longs = new ValuePadding[NUM_THREADS];
for (int i = 0; i < longs.length; i++) {
longs[i] = new ValuePadding();
}
for (int i = 0; i < threads.length; i++) {
threads[i] = new Thread(new FalseSharing(i));
}
for (Thread t : threads) {
t.start();
}
for (Thread t : threads) {
t.join();
}
}
public void run() {
long i = ITERATIONS + 1;
while (0 != --i) {
longs[arrayIndex].value = 0L;
}
}
public final static 編程客棧class ValuePadding {
protected long p1, p2, p3, p4, p5, p6, p7;
protected volatile long value = 0L;
protected long p9, p10, p11, p12, p13, p14;
protected long p15;
}
public final static class ValueNoPadding {
// protected long p1, p2, p3, p4, p5, p6, p7;
protected volatile long value = 0L;
// protected long p9, p10, p11, p12, p13, p14, p15;
}
}

在2G Hz，2核，8G內(nèi)存, jdk 1.7.0_45 的運(yùn)行環(huán)境下，使用了共享機(jī)制比沒有使用共享機(jī)制，速度快了4倍左右。

結(jié)果：

Thread num 1 duration = 447
Thread num 2 duration = 463
Thread num 3 duration = 454
Thread num 4 duration = 464
Thread num 5 duration = 561
Thread num 6 duration = 606
Thread num 7 duration = 684
Thread num 8 duration = 870
Thread num 9 duration = 823

把代碼中ValuePadding都替換為ValueNoPadding后的結(jié)果：

Thread num 1 duration = 446
Thread num 2 duration = 2549
Thread num 3 duration = 2898
Thread num 4 duration = 3931
Thread num 5 duration = 4716
Thread num 6 duration = 5424
Thread num 7 duration = 4868
Thread num 8 duration = 4595
Thread num 9 duration = 4540

備注：在jdk1.8中，有專門的注解@Contended來避免偽共享，更優(yōu)雅地解決問題。

四、Disruptor的設(shè)計(jì)方案

Disruptor通過以下設(shè)計(jì)來解決隊(duì)列速度慢的問題：

環(huán)形數(shù)組結(jié)構(gòu)

為了避免垃圾回收，采用數(shù)組而非鏈表。同時(shí)，數(shù)組對(duì)處理器的緩存機(jī)制更加友好。

元素位置定位

數(shù)組長(zhǎng)度2^n，通過位運(yùn)算，加快定位的速度。下標(biāo)采取遞增的形式。不用擔(dān)心index溢出的問題。index是long類型，即使100萬(wàn)QPS的處理速度，也需要30萬(wàn)年才能用完。

無鎖設(shè)計(jì)

每個(gè)生產(chǎn)者或者消費(fèi)者線程，會(huì)先申請(qǐng)可以操作的元素在數(shù)組中的位置，申請(qǐng)到之后，直接在該位置寫入或者讀取數(shù)據(jù)。

下面忽略數(shù)組的環(huán)形結(jié)構(gòu)，介紹一下如何實(shí)現(xiàn)無鎖設(shè)計(jì)。整個(gè)過程通過原子變量CAS，保證操作的線程安全。

1.一個(gè)生產(chǎn)者

寫數(shù)據(jù)

生產(chǎn)者單線程寫數(shù)據(jù)的流程比較簡(jiǎn)單：

1.申請(qǐng)寫入m個(gè)元素；
2.若是有m個(gè)元素可以入，則返回最大的序列號(hào)。這兒主要判斷是否會(huì)覆蓋未讀的元素；
3.若是返回的正確，則生產(chǎn)者開始寫入元素。

從實(shí)戰(zhàn)角度詳解Disruptor高性能隊(duì)列

圖5 單個(gè)生產(chǎn)者生產(chǎn)過程示意圖

2.多個(gè)生產(chǎn)者

多個(gè)生產(chǎn)者的情況下，會(huì)遇到“如何防止多個(gè)線程重復(fù)寫同一個(gè)元素”的問題。Disruptor的解決方法是，每個(gè)線程獲取不同的一段數(shù)組空間進(jìn)行操作。這個(gè)通過CAS很容易達(dá)到。只需要在分配元素的時(shí)候，通過CAS判斷一下這段空間是否已經(jīng)分配出去即可。

但是會(huì)遇到一個(gè)新問題：如何防止讀取的時(shí)候，讀到還未寫的元素。Disruptor在多個(gè)生產(chǎn)者的情況下，引入了一個(gè)與Ring Buffer大小相同的buffer：a編程客棧vailable Buffer。當(dāng)某個(gè)位置寫入成功的時(shí)候，便把a(bǔ)vailble Buffer相應(yīng)的位置置位，標(biāo)記為寫入成功。讀取的時(shí)候，會(huì)遍歷available Buffer，來判斷元素是否已經(jīng)就緒。

下面分讀數(shù)據(jù)和寫數(shù)據(jù)兩種情況介紹。

a.讀數(shù)據(jù)

生產(chǎn)者多線程寫入的情況會(huì)復(fù)雜很多：

申請(qǐng)讀取到序號(hào)n；
若writer cursor >= n，這時(shí)仍然無法確定連續(xù)可讀的最大下標(biāo)。從reader cursor開始讀取available Buffer，一直查到第一個(gè)不可用的元素，然后返回最大連續(xù)可讀元素的位置；
消費(fèi)者讀取元素。

如下圖所示，讀線程讀到下標(biāo)為2的元素，三個(gè)線程Writer1/Writer2/Writer3正在向RingBuffer相應(yīng)位置寫數(shù)據(jù)，寫線程被分配到的最大元素下標(biāo)是11。

讀線程申請(qǐng)讀取到下標(biāo)從3到11的元素，判斷writer cursor>=11。然后開始讀取availableBuffer，從3開始，往后讀取，發(fā)現(xiàn)下標(biāo)為7的元素沒有生產(chǎn)成功，于是WaitFor(11)返回6。

然后，消費(fèi)者讀取下標(biāo)從3到6共計(jì)4個(gè)元素。

從實(shí)戰(zhàn)角度詳解Disruptor高性能隊(duì)列

圖6 多個(gè)生產(chǎn)者情況下，消費(fèi)者消費(fèi)過程示意圖

b.寫數(shù)據(jù)

多個(gè)生產(chǎn)者寫入的時(shí)候：

申請(qǐng)寫入m個(gè)元素；
若是有m個(gè)元素可以寫入，則返回最大的序列號(hào)。每個(gè)生產(chǎn)者會(huì)被分配一段獨(dú)享的空間；
生產(chǎn)者寫入元素，寫入元素的同時(shí)設(shè)置available Buffer里面相應(yīng)的位置，以標(biāo)記自己哪些位置是已經(jīng)寫入成功的。

如下圖所示，Writer1和Writer2兩個(gè)線程寫入數(shù)組，都申請(qǐng)可寫的數(shù)組空間。Writer1被分配了下標(biāo)3到下表5的空間，Writer2被分配了下標(biāo)6到下標(biāo)9的空間。

Writer1寫入下標(biāo)3位置的元素，同時(shí)把a(bǔ)vailable Buffer相應(yīng)位置置位，標(biāo)記已經(jīng)寫入成功，往后移一位，開始寫下標(biāo)4位置的元素。Writer2同樣的方式。最終都寫入完成。

從實(shí)戰(zhàn)角度詳解Disruptor高性能隊(duì)列

圖7 多個(gè)生產(chǎn)者情況下編程客棧，生產(chǎn)者生產(chǎn)過程示意圖

防止不同生產(chǎn)者對(duì)同一段空間寫入的代碼，如下所示：

public long tryNext(int n) throws InsufficientCapacityException
{
if (n < 1)
{
throw new IllegalArgumentException("n must be > 0");
}
long current;
long next;
do
{
current = cursor.get();
next = current + n;
if (!hasAvailableCapacity(gatingSequences, n, current))
{
throw InsufficientCapacityException.INSTANCE;
}
}
while (!cursor.compareAndSet(current, next));
return next;
}

通過do/while循環(huán)的條件cursor.compareAndSet(current, next)，來判斷每次申請(qǐng)的空間是否已經(jīng)被其他生產(chǎn)者占據(jù)。假如已經(jīng)被占據(jù)，該函數(shù)會(huì)返回失敗，While循環(huán)重新執(zhí)行，申請(qǐng)寫入空間。

消費(fèi)者的流程與生產(chǎn)者非常類似，這兒就不多描述了。

五、總結(jié)

Disruptor通過精巧的無鎖設(shè)計(jì)實(shí)現(xiàn)了在高并發(fā)情形下的高性能。

在美團(tuán)內(nèi)部，很多高并發(fā)場(chǎng)景借鑒了Disruptor的設(shè)計(jì)，減少競(jìng)爭(zhēng)的強(qiáng)度。其設(shè)計(jì)思想可以擴(kuò)展到分布式場(chǎng)景，通過無鎖設(shè)計(jì)，來提升服務(wù)性能。

使用Disruptor比使用ArrayBlockingQueue略微復(fù)雜，為方便讀者上手，增加代碼樣例。

代碼實(shí)現(xiàn)的功能：每10ms向disruptor中插入一個(gè)元素，消費(fèi)者讀取數(shù)據(jù)，并打印到終端。詳細(xì)邏輯請(qǐng)細(xì)讀代碼。

以下代碼基于3.3.4版本的Disruptor包。

package com.meituan.Disruptor;
/**
* @description disruptor代碼樣例。每10ms向disruptor中插入一個(gè)元素，消費(fèi)者讀取數(shù)據(jù)，并打印到終端
*/
import com.lmax.disruptor.*;
import com.lmax.disruptor.dsl.Disruptor;
import com.lmax.disruptor.dsl.ProducerType;
import java.util.concurrent.ThreadFactory;
public class DisruptorMain
{
public static void main(String[] args) throws Exception
{
// 隊(duì)列中的元素
class Element {
private int value;
public int get(){
return value;
}
public void set(int value){
this.value= value;
}
}
// 生產(chǎn)者的線程工廠
ThreadFactory threadFactory = new ThreadFactory(){
@Override
public Thread newThread(Runnable r) {
return new Thread(r, "simpleThread");
}
};
// RingBuffer生產(chǎn)工廠,初始化RingBuffer的時(shí)候使用
EventFactory<Element> factory = new EventFactory<Element>() {
@Override
public Element newInstance() {
return new Element();
}
};
// 處理Event的handler
EventHandler<Element> handler = new EventHandler<Element>(){
@Override
public void onEvent(Element element, long sequence, boolean endOfBatch)
{
System.out.println("Element: " + element.get());
}
};
// 阻塞策略
BlockingWaitStrategy strategy = new BlockingWaitStrategy();
// 指定RingBuffer的大小
int bufferSize = 16;
// 創(chuàng)建disruptor，采用單生產(chǎn)者模式
Disruptor<Element> disruptor = new Disruptor(factory, bufferSize, threadFactory, ProducerType.SINGLE, strategy);
// 設(shè)置EventHandler
disruptor.handleEventsWith(handler);
// 啟動(dòng)disruptor的線程
disruptor.start();
RingBuffer<Element> ringBuffer = disruptor.getRingBuffer();
for (int l = 0; true; l++)
{
// 獲取下一個(gè)可用位置的下標(biāo)
long sequence = ringBuffer.next();
try
{
// 返回可用位置的元素
Element event = ringBuffer.get(sequence);
// 設(shè)置該位置元素的值
event.set(l);
}
finally
{
ringBuffer.publish(sequence);
}
Thread.sleep(10);
}
}
}

六、性能

以下面這些模式測(cè)試性能:

從實(shí)戰(zhàn)角度詳解Disruptor高性能隊(duì)列

吞吐量測(cè)試數(shù)據(jù)（每秒的數(shù)量）如下。

環(huán)境： - CPU:Intel Core i7 860 @ 2.8 GHz without HT - JVM:Java 1.6.0_25 64-bit - OS:Windows 7

-	ABQ	Disruptor
Unicast: 1P – 1C	5,339,256	25,998,336
Pipeline: 1P – 3C	2,128,918	16,806,157
Sequencer: 3P – 1C	5,539,531	13,403,268
Multicast: 1P – 3C	1,077,384	9,377,871
Diamond: 1P – 3C	2,113,941	16,143,613

環(huán)境：

CPU:Intel Core i7-2720QM
JVM:Java 1.6.0_25 64-bit
OS:Ubuntu 11.04

-	ABQ	Disruptor
Unicast: 1P – 1C	4,057,453	22,381,378
Pipeline: 1P – 3C	2,006,903	15,857,913
Sequencer: 3P – 1C	2,056,118	14,540,519
Multicast: 1P – 3C	260,733	10,860,121
Diamond: 1P – 3C	2,082,725	15,295,197

依據(jù)并發(fā)競(jìng)爭(zhēng)的激烈程度的不同，Disruptor比ArrayBlockingQueue吞吐量快4~7倍。

按照Pipeline: 1P – 3C的連接模式測(cè)試延遲，生產(chǎn)者兩次寫入之間的延遲為1ms。

運(yùn)行環(huán)境：

CPU:2.2GHz Core i7-2720QM

Java: 1.6.0_25 64-bit

OS:Ubuntu 11.04.

-	Array Blocking Queue (ns)	Disruptor (ns)
99% observations less than	2,097,152	128
99.99% observations less than	4,194,304	8,192
Max Latency	5,069,086	175,567
Mean Latency	32,757	52
Min Latency	145	29

可見，平均延遲差了3個(gè)數(shù)量級(jí)。

七、等待策略

生產(chǎn)者的等待策略

暫時(shí)只有休眠1ns。

LockSupport.parkNanos(1);

消費(fèi)者的等待策略

名稱	措施	適用場(chǎng)景
BlockingWaitStrategy	加鎖	CPU資源緊缺，吞吐量和延遲并不重要的場(chǎng)景
BusySpinWaitStrategy	自旋	通過不斷重試，減少切換線程導(dǎo)致的系統(tǒng)調(diào)用，而降低延遲。推薦在線程綁定到固定的CPU的場(chǎng)景下使用
PhasedBackoffWaitStrategy	自旋 + yield + 自定義策略	CPU資源緊缺，吞吐量和延遲并不重要的場(chǎng)景
SleepingWaitStrategy	自旋 + yield + sleep	性能和CPU資源之間有很好的折中。延遲不均勻
TimeoutBlockingWaitStrategy	加鎖，有超時(shí)限制	CPU資源緊缺，吞吐量和延遲并不重要的場(chǎng)景
YieldingWaitStrategy	自旋 + yield + 自旋	性能和CPU資源之間有很好的折中。延遲比較均勻

八、Log4j 2應(yīng)用場(chǎng)景

Log4j 2相對(duì)于Log4j 1最大的優(yōu)勢(shì)在于多線程并發(fā)場(chǎng)景下性能更優(yōu)。該特性源自于Log4j 2的異步模式采用了Disruptor來處理。在Log4j 2的配置文件中可以配置WaitStrategy，默認(rèn)是Timeout策略。下面是Log4j 2中對(duì)WaitStrategy的配置官方文檔：

System Property	Default Value	Description
AsyncLogger. WaitStrategy	Timeout	Valid values: Block, Timeout, Sleep, Yield. Block is a strategy that uses a lock and condition variable for the I/O thread waiting for log events. Block can be used when throughput and low-latency are not as important as CPU resource. Recommended for resource constrained/virtualised environments. Timeout is a variation of the Block strategy that will periodically wake up from the lock condition await() call. This ensures that if a notification is missed somehow the consumer thread is not stuck but will recover with a small latency delay (default 10ms). Sleep is a strategy that initially spins, then uses a Thread.yield(), and eventually parks for the minimum number of nanos the OS and JVM will allow while the I/O thread is waiting for log events. Sleep is a good compromise between performance and CPU resource. This strategy has very low impact on the application thread, in exchange for some additional latency for actually getting the message logged. Yield is a strategy that uses a Thread.yield() for waiting for log events after an initially spinning. Yield is a good compromise between performance and CPU resource, but may use more CPU than Sleep in order to get the message logged to disk sooner.

System

Property

Default Value

Description

AsyncLogger.

WaitStrategy

Timeout

Valid values: Block, Timeout, Sleep, Yield. Block is a strategy that uses a lock and condition variable for the I/O thread waiting for log events. Block can be used when throughput and low-latency are not as important as CPU resource. Recommended for resource constrained/virtualised environments. Timeout is a variation of the Block strategy that will periodically wake up from the lock condition await() call. This ensures that if a notification is missed somehow the consumer thread is not stuck but will recover with a small latency delay (default 10ms). Sleep is a strategy that initially spins, then uses a Thread.yield(), and eventually parks for the minimum number of nanos the OS and JVM will allow while the I/O thread is waiting for log events. Sleep is a good compromise between performance and CPU resource. This strategy has very low impact on the application thread, in exchange for some additional latency for actually getting the message logged. Yield is a strategy that uses a Thread.yield() for waiting for log events after an initially spinning. Yield is a good compromise between performance and CPU resource, but may use more CPU than Sleep in order to get the message logged to disk sooner.

1.性能差異

loggers all async采用的是Disruptor，而Async Appender采用的是ArrayBlockingQueue隊(duì)列。

由圖可見，單線程情況下，loggers all async與Async Appender吞吐量相差不大，但是在64個(gè)線程的時(shí)候，loggers all async的吞吐量比Async Appender增加了12倍，是Sync模式的68倍。

從實(shí)戰(zhàn)角度詳解Disruptor高性能隊(duì)列

圖8 Log4j 2各個(gè)模式性能比較

美團(tuán)在公司內(nèi)部統(tǒng)一推行日志接入規(guī)范，要求必須使用Log4j 2，使普通單機(jī)QPS的上限不再只停留在幾千，極高地提升了服務(wù)性能。

參考文檔

http://brokendreams.iteye.com/blog/2255720
http://ifeve.com/dissecting-disruptor-whats-so-special/
https://github.com/LMAX-Exchange/disruptor/wiki/Performance-Results
https://lmax-exchange.github.io/disruptor/
https://logging.apache.org/log4j/2.x/manual/async.html
https://tech.meituan.com/2016/11/18/disruptor.html

到此這篇關(guān)于從實(shí)戰(zhàn)角度詳解Disruptor高性能隊(duì)列的文章就介紹到這了,更多相關(guān)Disruptor隊(duì)列內(nèi)容請(qǐng)搜索服務(wù)器之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持服務(wù)器之家！

原文鏈接：https://www.cnblogs.com/xuxh120/p/15186964.html