日日操夜夜添-日日操影院-日日草夜夜操-日日干干-精品一区二区三区波多野结衣-精品一区二区三区高清免费不卡

公告:魔扣目錄網(wǎng)為廣大站長(zhǎng)提供免費(fèi)收錄網(wǎng)站服務(wù),提交前請(qǐng)做好本站友鏈:【 網(wǎng)站目錄:http://www.ylptlb.cn 】, 免友鏈快審服務(wù)(50元/站),

點(diǎn)擊這里在線(xiàn)咨詢(xún)客服
新站提交
  • 網(wǎng)站:51998
  • 待審:31
  • 小程序:12
  • 文章:1030137
  • 會(huì)員:747

對(duì)于較多數(shù)量的文件描述符的監(jiān)聽(tīng)無(wú)論是select還是poll系統(tǒng)調(diào)用都顯得捉襟見(jiàn)肘,poll每次都需要將所有的文件描述符復(fù)制到內(nèi)核,內(nèi)核本身不會(huì)對(duì)這些文件描述符加以保存,這樣的設(shè)計(jì)就導(dǎo)致了poll的效率的低下。

而epoll則對(duì)此做了相應(yīng)的改進(jìn),不是epoll_wait的時(shí)候才傳入fd,而是通過(guò)epoll_ctl把所有fd傳入內(nèi)核,再一起”wait”,這就省掉了不必要的重復(fù)拷貝。

其次,在 epoll_wait時(shí),也不是把current輪流地加入fd對(duì)應(yīng)的設(shè)備等待隊(duì)列,而是在設(shè)備等待隊(duì)列醒來(lái)時(shí)調(diào)用一個(gè)回調(diào)函數(shù)(當(dāng)然,這就需要“喚醒回調(diào)”機(jī)制),把產(chǎn)生事件的fd歸入一個(gè)鏈表,然后返回這個(gè)鏈表上的fd。另外,epoll機(jī)制實(shí)現(xiàn)了自己特有的文件系統(tǒng)eventpoll filesystem。

從linux源碼角度看Epoll,透過(guò)現(xiàn)象看本質(zhì)

 

epoll初始化

當(dāng)系統(tǒng)啟動(dòng)時(shí),epoll會(huì)進(jìn)行初始化操作:

static int __init eventpoll_init(void)
{
    mutex_init(&epmutex);

    /* Initialize the structure used to perform safe poll wait head wake ups */
    ep_poll_safewake_init(&psw);

    /* Allocates slab cache used to allocate "struct epitem" items */
    epi_cache = kmem_cache_create("eventpoll_epi", sizeof(struct epitem),
            0, SLAB_HWCACHE_ALIGN|EPI_SLAB_DEBUG|SLAB_PANIC,
            NULL);

    /* Allocates slab cache used to allocate "struct eppoll_entry" */
    pwq_cache = kmem_cache_create("eventpoll_pwq",
            sizeof(struct eppoll_entry), 0,
            EPI_SLAB_DEBUG|SLAB_PANIC, NULL);

    return 0;
}
fs_initcall(eventpoll_init);

上面的代碼實(shí)現(xiàn)一些數(shù)據(jù)結(jié)構(gòu)的初始化,通過(guò)fs/eventpoll.c中的注釋可以看出,有三種類(lèi)型的鎖機(jī)制使用場(chǎng)景:

1.epmutex(mutex):用戶(hù)關(guān)閉文件描述符,但是沒(méi)有調(diào)用EPOLL_CTL_DEL
2.ep->mtx(mutex):用戶(hù)態(tài)與內(nèi)核態(tài)的轉(zhuǎn)換可能會(huì)睡眠
3.ep->lock(spinlock):內(nèi)核態(tài)與具體設(shè)備中斷過(guò)程中的轉(zhuǎn)換,poll回調(diào)

接下來(lái)就是使用slab分配器動(dòng)態(tài)分配內(nèi)存,第一個(gè)結(jié)構(gòu)為當(dāng)系統(tǒng)中添加一個(gè)fd時(shí),就創(chuàng)建一epitem結(jié)構(gòu)體,內(nèi)核管理的基本數(shù)據(jù)結(jié)構(gòu)。

內(nèi)核數(shù)據(jù)結(jié)構(gòu)

epoll在內(nèi)核主要維護(hù)了兩個(gè)數(shù)據(jù)結(jié)構(gòu)eventpoll與epitem,其中eventpoll表示每個(gè)epoll實(shí)例本身,epitem表示的是每一個(gè)IO所對(duì)應(yīng)的的事件。

struct epitem {
    /* RB tree node used to link this structure to the eventpoll RB tree */
    struct rb_node rbn; /*用于掛載到eventpoll管理的紅黑樹(shù)*/

    /* List header used to link this structure to the eventpoll ready list */
    struct list_head rdllink; /*掛載到eventpoll.rdlist的事件就緒隊(duì)列*/

    /*
     * Works together "struct eventpoll"->ovflist in keeping the
     * single linked chain of items.
     */
    struct epitem *next; /*用于主結(jié)構(gòu)體中的鏈表*/

    /* The file descriptor information this item refers to */
    struct epoll_filefd ffd; /*該結(jié)構(gòu)體對(duì)應(yīng)的被監(jiān)聽(tīng)的文件描述符信息(fd+file, 作為紅黑樹(shù)的key)*/

    /* Number of active wait queue attached to poll operations */
    int nwait;  /*poll(輪詢(xún)操作)的事件個(gè)數(shù)

    /* List containing poll wait queues */
    struct list_head pwqlist; /*雙向鏈表,保存被監(jiān)視文件的等待隊(duì)列,功能類(lèi)似于select/poll中的poll_table;同一個(gè)文件上可能會(huì)監(jiān)視多種事件,這些事件可能從屬于不同的wait_queue中,所以需要使用鏈表

    /* The "container" of this item */
    struct eventpoll *ep; /*當(dāng)前epitem的所有者(多個(gè)epitem從屬于一個(gè)eventpoll)*/

    /* List header used to link this item to the "struct file" items list */
    struct list_head fllink; /*雙向鏈表,用來(lái)鏈接被監(jiān)視的文件描述符對(duì)應(yīng)的struct file。因?yàn)閒ile里有f_ep_link用來(lái)保存所有監(jiān)視這個(gè)文件的epoll節(jié)點(diǎn)

    /* The structure that describe the interested events and the source fd */
    struct epoll_event event; /*注冊(cè)感興趣的事件,也就是用戶(hù)空間的epoll_event
};

而每個(gè)epoll fd對(duì)應(yīng)的主要數(shù)據(jù)結(jié)構(gòu)為:

struct eventpoll {
    /* Protect the this structure access */
    spinlock_t lock; /*自旋鎖,在kernel內(nèi)部用自旋鎖加鎖,就可以同時(shí)多線(xiàn)(進(jìn))程對(duì)此結(jié)構(gòu)體進(jìn)行操作,主要是保護(hù)ready_list*/

    /*
     * This mutex is used to ensure that files are not removed
     * while epoll is using them. This is held during the event
     * collection loop, the file cleanup path, the epoll file exit
     * code and the ctl operations.
     */
    struct mutex mtx; /*防止使用時(shí)被刪除*/

    /* Wait queue used by sys_epoll_wait() */
    wait_queue_head_t wq; /*sys_epoll_wait()使用的等待隊(duì)列*/

    /* Wait queue used by file->poll() */
    wait_queue_head_t poll_wait; /*file->epoll()使用的等待隊(duì)列*/

    /* List of ready file descriptors */
    struct list_head rdllist; /*事件就緒鏈表*/

    /* RB tree root used to store monitored fd structs */
    struct rb_root rbr; /*用于管理當(dāng)前epoll關(guān)注的文件描述符(樹(shù)根)*/

    /*
     * This is a single linked list that chains all the "struct epitem" that
     * hAppened while transfering ready events to userspace w/out
     * holding ->lock.
     */
    struct epitem *ovflist; /*在向用戶(hù)空間傳輸就緒事件的時(shí)候,將同時(shí)發(fā)生事件的文件描述符鏈入到這個(gè)鏈表里面*/
};
從Linux源碼角度看Epoll,透過(guò)現(xiàn)象看本質(zhì)

 

函數(shù)調(diào)用關(guān)系

epoll_create

每個(gè)eventpoll通過(guò)epoll_create()創(chuàng)建:

asmlinkage long sys_epoll_create(int size)
{
    int error, fd = -1;
    struct eventpoll *ep;

    DNPRINTK(3, (KERN_INFO "[%p] eventpoll: sys_epoll_create(%d)n",
             current, size));

    /*
     * Sanity check on the size parameter, and create the internal data
     * structure ( "struct eventpoll" ).
     */
    error = -EINVAL;
  /*為ep分配內(nèi)存并進(jìn)行初始化*/
    if (size <= 0 || (error = ep_alloc(&ep)) < 0) {
        fd = error;
        goto error_return;
    }

    /*
     * Creates all the items needed to setup an eventpoll file. That is,
     * a file structure and a free file descriptor.
     */
   /*調(diào)用anon_inode_getfd新建一個(gè)struct file,也就是epoll可以看成一個(gè)文件(由* 于沒(méi)有任何文件系統(tǒng),為匿名文件)。并且將主結(jié)構(gòu)體struct eventpoll *ep放入* file->private項(xiàng)中進(jìn)行保存(sys_epoll_ctl會(huì)取用)*/
    fd = anon_inode_getfd("[eventpoll]", &eventpoll_fops, ep);
    if (fd < 0)
        ep_free(ep);

error_return:
    DNPRINTK(3, (KERN_INFO "[%p] eventpoll: sys_epoll_create(%d) = %dn",
             current, size, fd));

    return fd;
}

epoll_ctl

asmlinkage long sys_epoll_ctl(int epfd, int op, int fd,
                  struct epoll_event __user *event)
{
    int error;
    struct file *file, *tfile;
    struct eventpoll *ep;
    struct epitem *epi;
    struct epoll_event epds;

    DNPRINTK(3, (KERN_INFO "[%p] eventpoll: sys_epoll_ctl(%d, %d, %d, %p)n",
             current, epfd, op, fd, event));

    error = -EFAULT;
  /*判斷參數(shù)合法性,將__user *event 復(fù)制給epds*/
    if (ep_op_has_event(op) &&
        copy_from_user(&epds, event, sizeof(struct epoll_event)))
        goto error_return;

    /* Get the "struct file *" for the eventpoll file */
    error = -EBADF;
    file = fget(epfd); /*epoll fd對(duì)應(yīng)的文件對(duì)象*/
    if (!file)
        goto error_return;

    /* Get the "struct file *" for the target file */
    tfile = fget(fd); /*fd對(duì)應(yīng)的文件對(duì)象*/
    if (!tfile)
        goto error_fput;

    /* The target file descriptor must support poll */
    error = -EPERM;
    if (!tfile->f_op || !tfile->f_op->poll)
        goto error_tgt_fput;

  ...

    /*
     * At this point it is safe to assume that the "private_data" contains
     * our own data structure.
     */
    ep = file->private_data; /*在create時(shí)存入進(jìn)去的(anon_inode_getfd),現(xiàn)在取用。*/

    mutex_lock(&ep->mtx);

    /*
     * Try to lookup the file inside our RB tree, Since we grabbed "mtx"
     * above, we can be sure to be able to use the item looked up by
     * ep_find() till we release the mutex.
     */
    epi = ep_find(ep, tfile, fd); /*防止重復(fù)添加(在ep的紅黑樹(shù)中查找是否已經(jīng)存在這個(gè)fd)*/

    switch (op) {
    case EPOLL_CTL_ADD: /*新增一個(gè)監(jiān)聽(tīng)fd*/
        if (!epi) {
            epds.events |= POLLERR | POLLHUP; /*默認(rèn)包含POLLERR和POLLHUP事件*/

            error = ep_insert(ep, &epds, tfile, fd); /*在ep的紅黑樹(shù)中插入這個(gè)fd對(duì)應(yīng)的epitm結(jié)構(gòu)體。*/
        } else /*重復(fù)添加(在ep的紅黑樹(shù)中查找已經(jīng)存在這個(gè)fd)。*/
            error = -EEXIST;
        break;

  ...

    }

  ...

    return error;
}

其中ep_insert的實(shí)現(xiàn)如下:

```c
static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
             struct file *tfile, int fd)
{
    int error, revents, pwake = 0;
    unsigned long flags;
    struct epitem *epi;
    struct ep_pqueue epq;

    error = -ENOMEM;
  /*分配一個(gè)epitem結(jié)構(gòu)體來(lái)保存每個(gè)存入的fd*/
    if (!(epi = kmem_cache_alloc(epi_cache, GFP_KERNEL)))
        goto error_return;

    /* Item initialization follow here ... */
  /*初始化該結(jié)構(gòu)體*/
    INIT_LIST_HEAD(&epi->rdllink);
    INIT_LIST_HEAD(&epi->fllink);
    INIT_LIST_HEAD(&epi->pwqlist);
    epi->ep = ep;
    ep_set_ffd(&epi->ffd, tfile, fd);
    epi->event = *event;
    epi->nwait = 0;
    epi->next = EP_UNACTIVE_PTR;

    /* Initialize the poll table using the queue callback */
    epq.epi = epi;
  /*安裝poll回調(diào)函數(shù)*/
    init_poll_funcptr(&epq.pt, ep_ptable_queue_proc);

    /*
     * Attach the item to the poll hooks and get current event bits.
     * We can safely use the file* here because its usage count has
     * been increased by the caller of this function. Note that after
     * this operation completes, the poll callback can start hitting
     * the new item.
     */
   /* 
   * 調(diào)用poll函數(shù)來(lái)獲取當(dāng)前事件位,其實(shí)是利用它來(lái)調(diào)用注冊(cè)函數(shù)ep_ptable_queue_proc(poll_wait中調(diào)用)。
   * 如果fd是套接字,f_op為socket_file_ops,poll函數(shù)是sock_poll()。
   * 如果是TCP套接字的話(huà),進(jìn)而會(huì)調(diào)用到tcp_poll()函數(shù)。此處調(diào)用poll函數(shù)查看當(dāng)前文件描述符的狀態(tài),存儲(chǔ)在revents中。
   * 在poll的處理函數(shù)(tcp_poll())中,會(huì)調(diào)用sock_poll_wait(),
   *  在sock_poll_wait()中會(huì)調(diào)用到epq.pt.qproc指向的函數(shù),也就是ep_ptable_queue_proc()。  
   */ 
    revents = tfile->f_op->poll(tfile, &epq.pt);

    /* Add the current item to the list of active epoll hook for this file */
    spin_lock(&tfile->f_ep_lock);
    list_add_tail(&epi->fllink, &tfile->f_ep_links);
    spin_unlock(&tfile->f_ep_lock);

    /*
     * Add the current item to the RB tree. All RB tree operations are
     * protected by "mtx", and ep_insert() is called with "mtx" held.
     */
    ep_rbtree_insert(ep, epi); /*將該epi插入到ep的紅黑樹(shù)中*/

    /* We have to drop the new item inside our item list to keep track of it */
    spin_lock_irqsave(&ep->lock, flags);

    /* If the file is already "ready" we drop it inside the ready list */
  /* 
  * revents & event->events:剛才fop->poll的返回值中標(biāo)識(shí)的事件有用戶(hù)event關(guān)心的事件發(fā)生。
  * !ep_is_linked(&epi->rdllink):epi的ready隊(duì)列中有數(shù)據(jù)。ep_is_linked用于判斷隊(duì)列是否為空。
  */

  /* 如果要監(jiān)視的文件狀態(tài)已經(jīng)就緒并且還沒(méi)有加入到就緒隊(duì)列中,則將當(dāng)前的epitem加入到就緒隊(duì)列中.如果有進(jìn)程正在等待該文件的狀態(tài)就緒,則喚醒一個(gè)等待的進(jìn)程。  */ 
    if ((revents & event->events) && !ep_is_linked(&epi->rdllink)) {
    /*將當(dāng)前epi插入到ep->ready隊(duì)列中。*/
        list_add_tail(&epi->rdllink, &ep->rdllist);

        /* Notify waiting tasks that events are available */
    /* 如果有進(jìn)程正在等待文件的狀態(tài)就緒,也就是調(diào)用epoll_wait睡眠的進(jìn)程正在等待,則喚醒一個(gè)等待進(jìn)程。waitqueue_active(q) 等待隊(duì)列q中有等待的進(jìn)程返回1,否則返回0。*/
        if (waitqueue_active(&ep->wq))
            wake_up_locked(&ep->wq);
    /*  如果有進(jìn)程等待eventpoll文件本身(???)的事件就緒,則增加臨時(shí)變量pwake的值,pwake的值不為0時(shí),在釋放lock后,會(huì)喚醒等待進(jìn)程。 */ 
        if (waitqueue_active(&ep->poll_wait))
            pwake++;
    }

    spin_unlock_irqrestore(&ep->lock, flags);

    /* We have to call this outside the lock */
    if (pwake)
  /*喚醒等待eventpoll文件狀態(tài)就緒的進(jìn)程*/
        ep_poll_safewake(&psw, &ep->poll_wait);

    DNPRINTK(3, (KERN_INFO "[%p] eventpoll: ep_insert(%p, %p, %d)n",
             current, ep, tfile, fd));

    return 0;

...
}

init_poll_funcptr(&epq.pt, ep_ptable_queue_proc);和revents = tfile->f_op->poll(tfile, &epq.pt);這兩個(gè)函數(shù)將ep_ptable_queue_proc注冊(cè)到epq.pt中的qproc。

typedef struct poll_table_struct {

poll_queue_proc qproc;

unsigned long key;

}poll_table;

執(zhí)行f_op->poll(tfile, &epq.pt)時(shí),XXX_poll(tfile, &epq.pt)函數(shù)會(huì)執(zhí)行poll_wait(),poll_wait()會(huì)調(diào)用epq.pt.qproc函數(shù),即ep_ptable_queue_proc。

更多Linux內(nèi)核視頻教程文檔資料免費(fèi)領(lǐng)取后臺(tái)私信【內(nèi)核】自行獲取。

從Linux源碼角度看Epoll,透過(guò)現(xiàn)象看本質(zhì)

 

內(nèi)核學(xué)習(xí)網(wǎng)站:

Linux內(nèi)核源碼/內(nèi)存調(diào)優(yōu)/文件系統(tǒng)/進(jìn)程管理/設(shè)備驅(qū)動(dòng)/網(wǎng)絡(luò)協(xié)議棧-學(xué)習(xí)視頻教程-騰訊課堂

ep_ptable_queue_proc函數(shù)如下:

/*當(dāng)poll醒來(lái)時(shí)就回調(diào)用該函數(shù),在文件操作中的poll函數(shù)中調(diào)用,將epoll的回調(diào)函數(shù)加入到目標(biāo)文件的喚醒隊(duì)列中。如果監(jiān)視的文件是套接字,參數(shù)whead則是sock結(jié)構(gòu)的sk_sleep成員的地址*/
static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead,
                 poll_table *pt)
{
  /*pt獲取struct ep_queue的epi字段。*/
    struct epitem *epi = ep_item_from_epqueue(pt);
    struct eppoll_entry *pwq;

    if (epi->nwait >= 0 && (pwq = kmem_cache_alloc(pwq_cache, GFP_KERNEL))) {
        init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);
        pwq->whead = whead;
        pwq->base = epi;
        add_wait_queue(whead, &pwq->wait);
        list_add_tail(&pwq->llink, &epi->pwqlist);
        epi->nwait++;
    } else {
        /* We have to signal that an error occurred */
    /*如果分配內(nèi)存失敗,則將nwait置為-1,表示發(fā)生錯(cuò)誤,即內(nèi)存分配失敗,或者已發(fā)生錯(cuò)誤*/
        epi->nwait = -1;
    }
}

其中struct eppoll_entry定義如下:

struct eppoll_entry {

   struct list_head llink;

   struct epitem *base;

   wait_queue_t wait;

   wait_queue_head_t *whead;

};

ep_ptable_queue_proc 函數(shù)完成 epitem 加入到特定文件的wait隊(duì)列任務(wù)。
ep_ptable_queue_proc有三個(gè)參數(shù):

struct file *file;              該fd對(duì)應(yīng)的文件對(duì)象

wait_queue_head_t *whead;      該fd對(duì)應(yīng)的設(shè)備等待隊(duì)列(同select中的mydev->wait_address)

poll_table *pt;                 f_op->poll(tfile, &epq.pt)中的epq.pt

在ep_ptable_queue_proc函數(shù)中,引入了另外一個(gè)非常重要的數(shù)據(jù)結(jié)構(gòu)eppoll_entry。eppoll_entry主要完成epitem和epitem事件發(fā)生時(shí)的callback(ep_poll_callback)函數(shù)之間的關(guān)聯(lián)。首先將eppoll_entry的whead指向fd的設(shè)備等待隊(duì)列(同select中的wait_address),然后初始化eppoll_entry的base變量指向epitem,最后通過(guò)add_wait_queue將epoll_entry掛載到fd的設(shè)備等待隊(duì)列上。完成這個(gè)動(dòng)作后,epoll_entry已經(jīng)被掛載到fd的設(shè)備等待隊(duì)列。

由于ep_ptable_queue_proc函數(shù)設(shè)置了等待隊(duì)列的ep_poll_callback回調(diào)函數(shù)。所以在設(shè)備硬件數(shù)據(jù)到來(lái)時(shí),硬件中斷處理函數(shù)中會(huì)喚醒該等待隊(duì)列上等待的進(jìn)程時(shí),會(huì)調(diào)用喚醒函數(shù)ep_poll_callback

static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *key)
{
    int pwake = 0;
    unsigned long flags;
    struct epitem *epi = ep_item_from_wait(wait);
    struct eventpoll *ep = epi->ep;

    spin_lock_irqsave(&ep->lock, flags);

    /*
     * If the event mask does not contain any poll(2) event, we consider the
     * descriptor to be disabled. This condition is likely the effect of the
     * EPOLLONESHOT bit that disables the descriptor when an event is received,
     * until the next EPOLL_CTL_MOD will be issued.
     */
    if (!(epi->event.events & ~EP_PRIVATE_BITS))
        goto out_unlock;

  ...

    /* If this file is already in the ready list we exit soon */
    if (ep_is_linked(&epi->rdllink))
        goto is_linked;
  /*將該fd加入到epoll監(jiān)聽(tīng)的就緒鏈表中*/
    list_add_tail(&epi->rdllink, &ep->rdllist);

is_linked:
    /*
     * Wake up ( if active ) both the eventpoll wait list and the ->poll()
     * wait list.
     */
   /*喚醒調(diào)用epoll_wait()函數(shù)時(shí)睡眠的進(jìn)程。用戶(hù)層epoll_wait(...) 超時(shí)前返回。*/
    if (waitqueue_active(&ep->wq))
        wake_up_locked(&ep->wq);
    if (waitqueue_active(&ep->poll_wait))
        pwake++;

out_unlock:
    spin_unlock_irqrestore(&ep->lock, flags);

    /* We have to call this outside the lock */
    if (pwake)
        ep_poll_safewake(&psw, &ep->poll_wait);

    return 1;
} 

epoll_wait

epoll_wait實(shí)現(xiàn)如下:

asmlinkage long sys_epoll_wait(int epfd, struct epoll_event __user *events,
                   int maxevents, int timeout)
{
    int error;
    struct file *file;
    struct eventpoll *ep;


    /* The maximum number of event must be greater than zero */
    if (maxevents <= 0 || maxevents > EP_MAX_EVENTS)
        return -EINVAL;

    /* Verify that the area passed by the user is writeable */
  /* 檢查用戶(hù)空間傳入的events指向的內(nèi)存是否可寫(xiě)。參見(jiàn)__range_not_ok()。*/
    if (!access_ok(VERIFY_WRITE, events, maxevents * sizeof(struct epoll_event))) {
        error = -EFAULT;
        goto error_return;
    }

    /* Get the "struct file *" for the eventpoll file */
  /* 獲取epfd對(duì)應(yīng)的eventpoll文件的file實(shí)例,file結(jié)構(gòu)是在epoll_create中創(chuàng)建。 */
    error = -EBADF;
    file = fget(epfd);
    if (!file)
        goto error_return;

    /*
     * We have to check that the file structure underneath the fd
     * the user passed to us _is_ an eventpoll file.
     */
    /* 通過(guò)檢查epfd對(duì)應(yīng)的文件操作是不是eventpoll_fops 來(lái)判斷epfd是否是一個(gè)eventpoll文件。如果不是則返回EINVAL錯(cuò)誤。 */
    error = -EINVAL;
    if (!is_file_epoll(file))
        goto error_fput;

    /*
     * At this point it is safe to assume that the "private_data" contains
     * our own data structure.
     */
    ep = file->private_data;

    /* Time to fish for events ... */
    error = ep_poll(ep, events, maxevents, timeout);

error_fput:
    fput(file);
error_return:

    return error;
}

ep_poll

epoll_wait調(diào)用ep_poll,ep_poll實(shí)現(xiàn)如下:

static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events,
           int maxevents, long timeout)
{
    int res, eavail;
    unsigned long flags;
    long jtimeout;
    wait_queue_t wait;

    /*
     * Calculate the timeout by checking for the "infinite" value ( -1 )
     * and the overflow condition. The passed timeout is in milliseconds,
     * that why (t * HZ) / 1000.
     */
   /* timeout是以毫秒為單位,這里是要轉(zhuǎn)換為jiffies時(shí)間。這里加上999(即1000-1),是為了向上取整。 */
    jtimeout = (timeout < 0 || timeout >= EP_MAX_MSTIMEO) ?
        MAX_SCHEDULE_TIMEOUT : (timeout * HZ + 999) / 1000;

retry:
    spin_lock_irqsave(&ep->lock, flags);

    res = 0;
    if (list_empty(&ep->rdllist)) {
        /*
         * We don't have any available event to return to the caller.
         * We need to sleep here, and we will be wake up by
         * ep_poll_callback() when events will become available.
         */
   /* 沒(méi)有事件,所以需要睡眠。當(dāng)有事件到來(lái)時(shí),睡眠會(huì)被ep_poll_callback函數(shù)喚醒。*/
        init_waitqueue_entry(&wait, current); /*將current進(jìn)程放在wait這個(gè)等待隊(duì)列中。*/
        wait.flags |= WQ_FLAG_EXCLUSIVE;
   /* 將當(dāng)前進(jìn)程加入到eventpoll的等待隊(duì)列中,等待文件狀態(tài)就緒或直到超時(shí),或被信號(hào)中斷。 */
        __add_wait_queue(&ep->wq, &wait);

        for (;;) {
            /*
             * We don't want to sleep if the ep_poll_callback() sends us
             * a wakeup in between. That's why we set the task state
             * to TASK_INTERRUPTIBLE before doing the checks.
             */
       /* 執(zhí)行ep_poll_callback()喚醒時(shí)應(yīng)當(dāng)需要將當(dāng)前進(jìn)程喚醒,所以當(dāng)前進(jìn)程狀態(tài)應(yīng)該為“可喚醒”TASK_INTERRUPTIBLE  */
            set_current_state(TASK_INTERRUPTIBLE);
       /* 如果就緒隊(duì)列不為空,也就是說(shuō)已經(jīng)有文件的狀態(tài)就緒或者超時(shí),則退出循環(huán)。*/
            if (!list_empty(&ep->rdllist) || !jtimeout)
                break;
       /* 如果當(dāng)前進(jìn)程接收到信號(hào),則退出循環(huán),返回EINTR錯(cuò)誤 */
            if (signal_pending(current)) {
                res = -EINTR;
                break;
            }

            spin_unlock_irqrestore(&ep->lock, flags);
       /*
        * 主動(dòng)讓出處理器,等待ep_poll_callback()將當(dāng)前進(jìn)程喚醒或者超時(shí),返回值是剩余的時(shí)間。
        * 從這里開(kāi)始當(dāng)前進(jìn)程會(huì)進(jìn)入睡眠狀態(tài),直到某些文件的狀態(tài)就緒或者超時(shí)。
        * 當(dāng)文件狀態(tài)就緒時(shí),eventpoll的回調(diào)函數(shù)ep_poll_callback()會(huì)喚醒在ep->wq指向的等待隊(duì)列中的進(jìn)程。
       */
            jtimeout = schedule_timeout(jtimeout);
            spin_lock_irqsave(&ep->lock, flags);
        }
        __remove_wait_queue(&ep->wq, &wait);

        set_current_state(TASK_RUNNING);
    }

    /* Is it worth to try to dig for events ? */
  /*
    * ep->ovflist鏈表存儲(chǔ)的向用戶(hù)傳遞事件時(shí)暫存就緒的文件。
    * 所以不管是就緒隊(duì)列ep->rdllist不為空,或者ep->ovflist不等于
    * EP_UNACTIVE_PTR,都有可能現(xiàn)在已經(jīng)有文件的狀態(tài)就緒。
    * ep->ovflist不等于EP_UNACTIVE_PTR有兩種情況,一種是NULL,此時(shí)
    * 可能正在向用戶(hù)傳遞事件,不一定就有文件狀態(tài)就緒,
    * 一種情況時(shí)不為NULL,此時(shí)可以肯定有文件狀態(tài)就緒,
    * 參見(jiàn)ep_send_events()。
    */
    eavail = !list_empty(&ep->rdllist);

    spin_unlock_irqrestore(&ep->lock, flags);

    /*
     * Try to transfer events to user space. In case we get 0 events and
     * there's still timeout left over, we go trying again in search of
     * more luck.
     */
   /* 如果沒(méi)有被信號(hào)中斷,并且有事件就緒,但是沒(méi)有獲取到事件(有可能被其他進(jìn)程獲取到了),并且沒(méi)有超時(shí),則跳轉(zhuǎn)到retry標(biāo)簽處,重新等待文件狀態(tài)就緒。 */
    if (!res && eavail &&
        !(res = ep_send_events(ep, events, maxevents)) && jtimeout)
        goto retry;
  
  /* 返回獲取到的事件的個(gè)數(shù)或者錯(cuò)誤碼 */
    return res;
}

ep_send_events()函數(shù)向用戶(hù)空間發(fā)送就緒事件。

ep_send_events()函數(shù)將用戶(hù)傳入的內(nèi)存簡(jiǎn)單封裝到ep_send_events_data結(jié)構(gòu)中,然后調(diào)用ep_scan_ready_list()將就緒隊(duì)列中的事件傳入用戶(hù)空間的內(nèi)存。 用戶(hù)空間訪(fǎng)問(wèn)這個(gè)結(jié)果,進(jìn)行處理。

從Linux源碼角度看Epoll,透過(guò)現(xiàn)象看本質(zhì)

 

分享到:
標(biāo)簽:Linux
用戶(hù)無(wú)頭像

網(wǎng)友整理

注冊(cè)時(shí)間:

網(wǎng)站:5 個(gè)   小程序:0 個(gè)  文章:12 篇

  • 51998

    網(wǎng)站

  • 12

    小程序

  • 1030137

    文章

  • 747

    會(huì)員

趕快注冊(cè)賬號(hào),推廣您的網(wǎng)站吧!
最新入駐小程序

數(shù)獨(dú)大挑戰(zhàn)2018-06-03

數(shù)獨(dú)一種數(shù)學(xué)游戲,玩家需要根據(jù)9

答題星2018-06-03

您可以通過(guò)答題星輕松地創(chuàng)建試卷

全階人生考試2018-06-03

各種考試題,題庫(kù),初中,高中,大學(xué)四六

運(yùn)動(dòng)步數(shù)有氧達(dá)人2018-06-03

記錄運(yùn)動(dòng)步數(shù),積累氧氣值。還可偷

每日養(yǎng)生app2018-06-03

每日養(yǎng)生,天天健康

體育訓(xùn)練成績(jī)?cè)u(píng)定2018-06-03

通用課目體育訓(xùn)練成績(jī)?cè)u(píng)定