中斷處理 - 上半部（硬中斷）

由于 APIC中斷控制器有點小復雜，所以本文主要通過 8259A中斷控制器來介紹linux對中斷的處理過程。

中斷處理相關結構

前面說過，8259A中斷控制器由兩片 8259A 風格的外部芯片以級聯的方式連接在一起，每個芯片可處理多達 8 個不同的 IRQ（中斷請求），所以可用 IRQ 線的個數達到 15 個。如下圖：

在內核中每條IRQ線由結構體 irq_desc_t 來描述，irq_desc_t 定義如下：

typedef struct {
 unsigned int status; /* IRQ status */
 hw_irq_controller *handler;
 struct irqaction *action; /* IRQ action list */
 unsigned int depth; /* nested irq disables */
 spinlock_t lock;
} irq_desc_t;

下面介紹一下 irq_desc_t 結構各個字段的作用：

status: IRQ線的狀態。
handler: 類型為 hw_interrupt_type 結構，表示IRQ線對應的硬件相關處理函數，比如 8259A中斷控制器接收到一個中斷信號時，需要發送一個確認信號才會繼續接收中斷信號的，發送確認信號的函數就是 hw_interrupt_type 中的 ack 函數。
action: 類型為 irqaction 結構，中斷信號的處理入口。由于一條IRQ線可以被多個硬件共享，所以 action 是一個鏈表，每個 action 代表一個硬件的中斷處理入口。
depth: 防止多次開啟和關閉IRQ線。
lock: 防止多核CPU同時對IRQ進行操作的自旋鎖。

hw_interrupt_type 這個結構與硬件相關，這里就不作介紹了，我們來看看 irqaction 這個結構：

struct irqaction {
 void (*handler)(int, void *, struct pt_regs *);
 unsigned long flags;
 unsigned long mask;
 const char *name;
 void *dev_id;
 struct irqaction *next;
};

下面說說 irqaction 結構各個字段的作用：

handler: 中斷處理的入口函數，handler 的第一個參數是中斷號，第二個參數是設備對應的ID，第三個參數是中斷發生時由內核保存的各個寄存器的值。
flags: 標志位，用于表示 irqaction 的一些行為，例如是否能夠與其他硬件共享IRQ線。
name: 用于保存中斷處理的名字。
dev_id: 設備ID。
next: 每個硬件的中斷處理入口對應一個 irqaction 結構，由于多個硬件可以共享同一條IRQ線，所以這里通過 next 字段來連接不同的硬件中斷處理入口。

irq_desc_t 結構關系如下圖：

注冊中斷處理入口

在內核中，可以通過 setup_irq() 函數來注冊一個中斷處理入口。setup_irq() 函數代碼如下：

int setup_irq(unsigned int irq, struct irqaction * new)
{
 int shared = 0;
 unsigned long flags;
 struct irqaction *old, **p;
 irq_desc_t *desc = irq_desc + irq;
 ...
 spin_lock_irqsave(&desc->lock,flags);
 p = &desc->action;
 if ((old = *p) != NULL) {
 if (!(old->flags & new->flags & SA_SHIRQ)) {
 spin_unlock_irqrestore(&desc->lock,flags);
 return -EBUSY;
 }

 do {
 p = &old->next;
 old = *p;
 } while (old);
 shared = 1;
 }

 *p = new;

 if (!shared) {
 desc->depth = 0;
 desc->status &= ~(IRQ_DISABLED | IRQ_AUTODETECT | IRQ_WAITING);
 desc->handler->startup(irq);
 }
 spin_unlock_irqrestore(&desc->lock,flags);

 register_irq_proc(irq); // 注冊proc文件系統
 return 0;
}

setup_irq() 函數比較簡單，就是通過 irq 號來查找對應的 irq_desc_t 結構，并把新的 irqaction 連接到 irq_desc_t 結構的 action 鏈表中。要注意的是，如果設備不支持共享IRQ線（也即是 flags 字段沒有設置 SA_SHIRQ 標志），那么就返回 EBUSY 錯誤。

我們看看時鐘中斷處理入口的注冊實例：

static struct irqaction irq0 = { timer_interrupt, SA_INTERRUPT, 0, "timer", NULL, NULL};

void __init time_init(void)
{
 ...
 setup_irq(0, &irq0);
}

可以看到，時鐘中斷處理入口的IRQ號為0，處理函數為 timer_interrupt()，并且不支持共享IRQ線（flags 字段沒有設置 SA_SHIRQ 標志）。

處理中斷請求

當一個中斷發生時，中斷控制層會發送信號給CPU，CPU收到信號會中斷當前的執行，轉而執行中斷處理過程。中斷處理過程首先會保存寄存器的值到棧中，然后調用 do_IRQ() 函數進行進一步的處理，do_IRQ() 函數代碼如下：

asmlinkage unsigned int do_IRQ(struct pt_regs regs)
{
 int irq = regs.orig_eax & 0xff; /* 獲取IRQ號 */
 int cpu = smp_processor_id();
 irq_desc_t *desc = irq_desc + irq;
 struct irqaction * action;
 unsigned int status;

 kstat.irqs[cpu][irq]++;
 spin_lock(&desc->lock);
 desc->handler->ack(irq);

 status = desc->status & ~(IRQ_REPLAY | IRQ_WAITING);
 status |= IRQ_PENDING; /* we _want_ to handle it */

 action = NULL;
 if (!(status & (IRQ_DISABLED | IRQ_INPROGRESS))) { // 當前IRQ不在處理中
 action = desc->action; // 獲取 action 鏈表
 status &= ~IRQ_PENDING; // 去除IRQ_PENDING標志, 這個標志用于記錄是否在處理IRQ請求的時候又發生了中斷
 status |= IRQ_INPROGRESS; // 設置IRQ_INPROGRESS標志, 表示正在處理IRQ
 }
 desc->status = status;

 if (!action) // 如果上一次IRQ還沒完成, 直接退出
 goto out;

 for (;;) {
 spin_unlock(&desc->lock);
 handle_IRQ_event(irq, ®s, action); // 處理IRQ請求
 spin_lock(&desc->lock);
 
 if (!(desc->status & IRQ_PENDING)) // 如果在處理IRQ請求的時候又發生了中斷, 繼續處理IRQ請求
 break;
 desc->status &= ~IRQ_PENDING;
 }
 desc->status &= ~IRQ_INPROGRESS;
out:

 desc->handler->end(irq);
 spin_unlock(&desc->lock);

 if (softirq_active(cpu) & softirq_mask(cpu))
 do_softirq(); // 中斷下半部處理
 return 1;
}

do_IRQ() 函數首先通過IRQ號獲取到其對應的 irq_desc_t 結構，注意的是同一個中斷有可能發生多次，所以要判斷當前IRQ是否正在被處理當中（判斷 irq_desc_t 結構的 status 字段是否設置了 IRQ_INPROGRESS 標志），如果不是處理當前，那么就獲取到 action 鏈表，然后通過調用 handle_IRQ_event() 函數來執行 action 鏈表中的中斷處理函數。

如果在處理中斷的過程中又發生了相同的中斷（irq_desc_t 結構的 status 字段被設置了 IRQ_INPROGRESS 標志），那么就繼續對中斷進行處理。處理完中斷后，調用 do_softirq() 函數來對中斷下半部進行處理（下面會說）。

接下來看看 handle_IRQ_event() 函數的實現：

int handle_IRQ_event(unsigned int irq, struct pt_regs * regs, struct irqaction * action)
{
 int status;
 int cpu = smp_processor_id();

 irq_enter(cpu, irq);

 status = 1; /* Force the "do bottom halves" bit */

 if (!(action->flags & SA_INTERRUPT)) // 如果中斷處理能夠在打開中斷的情況下執行, 那么就打開中斷
 __sti();

 do {
 status |= action->flags;
 action->handler(irq, action->dev_id, regs);
 action = action->next;
 } while (action);
 if (status & SA_SAMPLE_RANDOM)
 add_interrupt_randomness(irq);
 __cli();

 irq_exit(cpu, irq);

 return status;
}

handle_IRQ_event() 函數非常簡單，就是遍歷 action 鏈表并且執行其中的處理函數，比如對于時鐘中斷就是調用 timer_interrupt() 函數。這里要注意的是，如果中斷處理過程能夠開啟中斷的，那么就把中斷打開（因為CPU接收到中斷信號時會關閉中斷）。

中斷處理 - 下半部（軟中斷）

由于中斷處理一般在關閉中斷的情況下執行，所以中斷處理不能太耗時，否則后續發生的中斷就不能實時地被處理。鑒于這個原因，Linux把中斷處理分為兩個部分，上半部和下半部，上半部在前面已經介紹過，接下來就介紹一下下半部的執行。

一般中斷上半部只會做一些最基礎的操作（比如從網卡中復制數據到緩存中），然后對要執行的中斷下半部進行標識，標識完調用 do_softirq() 函數進行處理。

softirq機制

中斷下半部由 softirq（軟中斷）機制來實現的，在Linux內核中，有一個名為 softirq_vec 的數組，如下：

static struct softirq_action softirq_vec[32];

其類型為 softirq_action 結構，定義如下：

struct softirq_action
{
 void (*action)(struct softirq_action *);
 void *data;
};

softirq_vec 數組是 softirq 機制的核心，softirq_vec 數組每個元素代表一種softirq。但在Linux中只定義了四種softirq，如下：

enum
{
 HI_SOFTIRQ=0,
 NET_TX_SOFTIRQ,
 NET_RX_SOFTIRQ,
 TASKLET_SOFTIRQ
};

HI_SOFTIRQ 是高優先級tasklet，而 TASKLET_SOFTIRQ 是普通tasklet，tasklet是基于softirq機制的一種任務隊列（下面會介紹）。NET_TX_SOFTIRQ 和 NET_RX_SOFTIRQ 特定用于網絡子模塊的軟中斷（不作介紹）。

注冊softirq處理函數

要注冊一個softirq處理函數，可以通過 open_softirq() 函數來進行，代碼如下：

void open_softirq(int nr, void (*action)(struct softirq_action*), void *data)
{
 unsigned long flags;
 int i;

 spin_lock_irqsave(&softirq_mask_lock, flags);
 softirq_vec[nr].data = data;
 softirq_vec[nr].action = action;

 for (i=0; i<NR_CPUS; i++)
 softirq_mask(i) |= (1<<nr);
 spin_unlock_irqrestore(&softirq_mask_lock, flags);
}

open_softirq() 函數的主要工作就是向 softirq_vec 數組添加一個softirq處理函數。

Linux在系統初始化時注冊了兩種softirq處理函數，分別為 TASKLET_SOFTIRQ 和 HI_SOFTIRQ：

void __init softirq_init()
{
 ...
 open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL);
 open_softirq(HI_SOFTIRQ, tasklet_hi_action, NULL);
}

處理softirq

處理softirq是通過 do_softirq() 函數實現，代碼如下：

asmlinkage void do_softirq()
{
 int cpu = smp_processor_id();
 __u32 active, mask;

 if (in_interrupt())
 return;

 local_bh_disable();

 local_irq_disable();
 mask = softirq_mask(cpu);
 active = softirq_active(cpu) & mask;

 if (active) {
 struct softirq_action *h;

restart:
 softirq_active(cpu) &= ~active;

 local_irq_enable();

 h = softirq_vec;
 mask &= ~active;

 do {
 if (active & 1)
 h->action(h);
 h++;
 active >>= 1;
 } while (active);

 local_irq_disable();

 active = softirq_active(cpu);
 if ((active &= mask) != 0)
 goto retry;
 }

 local_bh_enable();

 return;

retry:
 goto restart;
}

前面說了 softirq_vec 數組有32個元素，每個元素對應一種類型的softirq，那么Linux怎么知道哪種softirq需要被執行呢？在Linux中，每個CPU都有一個類型為 irq_cpustat_t 結構的變量，irq_cpustat_t 結構定義如下：

typedef struct {
 unsigned int __softirq_active;
 unsigned int __softirq_mask;
 ...
} irq_cpustat_t;

其中 __softirq_active 字段表示有哪種softirq觸發了（int類型有32個位，每一個位代表一種softirq），而 __softirq_mask 字段表示哪種softirq被屏蔽了。Linux通過 __softirq_active 這個字段得知哪種softirq需要執行（只需要把對應位設置為1）。

所以，do_softirq() 函數首先通過 softirq_mask(cpu) 來獲取當前CPU對應被屏蔽的softirq，而 softirq_active(cpu) & mask 就是獲取需要執行的softirq，然后就通過對比 __softirq_active 字段的各個位來判斷是否要執行該類型的softirq。

tasklet機制

前面說了，tasklet機制是基于softirq機制的，tasklet機制其實就是一個任務隊列，然后通過softirq執行。在Linux內核中有兩種tasklet，一種是高優先級tasklet，一種是普通tasklet。這兩種tasklet的實現基本一致，唯一不同的就是執行的優先級，高優先級tasklet會先于普通tasklet執行。

tasklet本質是一個隊列，通過結構體 tasklet_head 存儲，并且每個CPU有一個這樣的隊列，我們來看看結構體 tasklet_head 的定義：

struct tasklet_head
{
 struct tasklet_struct *list;
};

struct tasklet_struct
{
 struct tasklet_struct *next;
 unsigned long state;
 atomic_t count;
 void (*func)(unsigned long);
 unsigned long data;
};

從 tasklet_head 的定義可以知道，tasklet_head 結構是 tasklet_struct 結構隊列的頭部，而 tasklet_struct 結構的 func 字段正式任務要執行的函數指針。Linux定義了兩種的tasklet隊列，分別為 tasklet_vec 和 tasklet_hi_vec，定義如下：

struct tasklet_head tasklet_vec[NR_CPUS];
struct tasklet_head tasklet_hi_vec[NR_CPUS];

可以看出，tasklet_vec 和 tasklet_hi_vec 都是數組，數組的元素個數為CPU的核心數，也就是每個CPU核心都有一個高優先級tasklet隊列和一個普通tasklet隊列。

調度tasklet

如果我們有一個tasklet需要執行，那么高優先級tasklet可以通過 tasklet_hi_schedule() 函數調度，而普通tasklet可以通過 tasklet_schedule() 調度。這兩個函數基本一樣，所以我們只分析其中一個：

static inline void tasklet_hi_schedule(struct tasklet_struct *t)
{
 if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
 int cpu = smp_processor_id();
 unsigned long flags;

 local_irq_save(flags);
 t->next = tasklet_hi_vec[cpu].list;
 tasklet_hi_vec[cpu].list = t;
 __cpu_raise_softirq(cpu, HI_SOFTIRQ);
 local_irq_restore(flags);
 }
}

函數參數的類型是 tasklet_struct 結構的指針，表示需要執行的tasklet結構。tasklet_hi_schedule() 函數首先判斷這個tasklet是否已經被添加到隊列中，如果不是就添加到 tasklet_hi_vec 隊列中，并且通過調用 __cpu_raise_softirq(cpu, HI_SOFTIRQ) 來告訴softirq需要執行 HI_SOFTIRQ 類型的softirq，我們來看看 __cpu_raise_softirq() 函數的實現：

static inline void __cpu_raise_softirq(int cpu, int nr)
{
 softirq_active(cpu) |= (1<<nr);
}

可以看出，__cpu_raise_softirq() 函數就是把 irq_cpustat_t 結構的 __softirq_active 字段的 nr位設置為1。對于 tasklet_hi_schedule() 函數就是把 HI_SOFTIRQ 位（0位）設置為1。

前面我們也介紹過，Linux在初始化時會注冊兩種softirq，TASKLET_SOFTIRQ 和 HI_SOFTIRQ：

void __init softirq_init()
{
 ...
 open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL);
 open_softirq(HI_SOFTIRQ, tasklet_hi_action, NULL);
}

所以當把 irq_cpustat_t 結構的 __softirq_active 字段的 HI_SOFTIRQ 位（0位）設置為1時，softirq機制就會執行 tasklet_hi_action() 函數，我們來看看 tasklet_hi_action() 函數的實現：

static void tasklet_hi_action(struct softirq_action *a)
{
 int cpu = smp_processor_id();
 struct tasklet_struct *list;

 local_irq_disable();
 list = tasklet_hi_vec[cpu].list;
 tasklet_hi_vec[cpu].list = NULL;
 local_irq_enable();

 while (list != NULL) {
 struct tasklet_struct *t = list;

 list = list->next;

 if (tasklet_trylock(t)) {
 if (atomic_read(&t->count) == 0) {
 clear_bit(TASKLET_STATE_SCHED, &t->state);

 t->func(t->data); // 調用tasklet處理函數
 tasklet_unlock(t);
 continue;
 }
 tasklet_unlock(t);
 }
 ...
 }
}

tasklet_hi_action() 函數非常簡單，就是遍歷 tasklet_hi_vec 隊列并且執行其中tasklet的處理函數。