日日操夜夜添-日日操影院-日日草夜夜操-日日干干-精品一区二区三区波多野结衣-精品一区二区三区高清免费不卡

公告:魔扣目錄網為廣大站長提供免費收錄網站服務,提交前請做好本站友鏈:【 網站目錄:http://www.ylptlb.cn 】, 免友鏈快審服務(50元/站),

點擊這里在線咨詢客服
新站提交
  • 網站:51998
  • 待審:31
  • 小程序:12
  • 文章:1030137
  • 會員:747

聊天機器人(ChatRobot)的概念我們并不陌生,也許你曾經在百無聊賴之下和Siri打情罵俏過,亦或是閑暇之余與小愛同學談笑風生,無論如何,我們都得承認,人工智能已經深入了我們的生活。目前市面上提供三方api的機器人不勝枚舉:微軟小冰、圖靈機器人、騰訊閑聊、青云客機器人等等,只要我們想,就隨時可以在App端或者web應用上進行接入。但是,這些應用的底層到底如何實現的?在沒有網絡接入的情況下,我們能不能像美劇《西部世界》(Westworld)里面描繪的那樣,機器人只需要存儲在本地的“心智球”就可以和人類溝通交流,如果你不僅僅滿足于當一個“調包俠”,請跟隨我們的旅程,本次我們將首度使用深度學習庫Keras/TensorFlow打造屬于自己的本地聊天機器人,不依賴任何三方接口與網絡。

首先安裝相關依賴:

pip3 install Tensorflow
pip3 install Keras
pip3 install nltk
pip3 install pandas

然后撰寫腳本test_bot.py導入需要的庫:

import nltk
import ssl
from nltk.stem.lancaster import LancasterStemmer
stemmer = LancasterStemmer()

import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.optimizers import SGD
import pandas as pd
import pickle
import random

這里有一個坑,就是自然語言分析庫NLTK會報一個錯誤:

Resource punkt not found

正常情況下,只要加上一行下載器代碼即可

import nltk
nltk.download('punkt')

但是由于學術上網的原因,很難通過Python/ target=_blank class=infotextkey>Python下載器正常下載,所以我們玩一次曲線救國,手動自己下載壓縮包:

https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/tokenizers/punkt.zip

解壓之后,放在你的用戶目錄下即可:

C:Usersliuyuetokenizersnltk_datapunkt

ok,言歸正傳,開發聊天機器人所面對的最主要挑戰是對用戶輸入信息進行分類,以及能夠識別人類的正確意圖(這個可以用機器學習解決,但是太復雜,我偷懶了,所以用的深度學習Keras)。第二就是怎樣保持語境,也就是分析和跟蹤上下文,通常情況下,我們不太需要對用戶意圖進行分類,只需要把用戶輸入的信息當作聊天機器人問題的答案即可,所這里我們使用Keras深度學習庫用于構建分類模型。

聊天機器人的意向和需要學習的模式都定義在一個簡單的變量中。不需要動輒上T的語料庫。我們知道如果玩機器人的,手里沒有語料庫,就會被人嘲笑,但是我們的目標只是為某一個特定的語境建立一個特定聊天機器人。所以分類模型作為小詞匯量創建,它僅僅將能夠識別為訓練提供的一小組模式。

說白了就是,所謂的機器學習,就是你重復的教機器做某一件或幾件正確的事情,在訓練中,你不停的演示怎么做是正確的,然后期望機器在學習中能夠舉一反三,只不過這次我們不教它很多事情,只一件,用來測試它的反應而已,是不是有點像你在家里訓練你的寵物狗?只不過狗子可沒法和你聊天。

這里的意向數據變量我就簡單舉個例子,如果愿意,你可以用語料庫對變量進行無限擴充:

intents = {"intents": [
        {"tag": "打招呼",
         "patterns": ["你好", "您好", "請問", "有人嗎", "師傅","不好意思","美女","帥哥","靚妹","hi"],
         "responses": ["您好", "又是您啊", "吃了么您內","您有事嗎"],
         "context": [""]
        },
        {"tag": "告別",
         "patterns": ["再見", "拜拜", "88", "回見", "回頭見"],
         "responses": ["再見", "一路順風", "下次見", "拜拜了您內"],
         "context": [""]
        },
   ]
}

可以看到,我插入了兩個語境標簽,打招呼和告別,包括用戶輸入信息以及機器回應數據。

在開始分類模型訓練之前,我們需要先建立詞匯。模式經過處理后建立詞匯庫。每一個詞都會有詞干產生通用詞根,這將有助于能夠匹配更多用戶輸入的組合。

for intent in intents['intents']:
    for pattern in intent['patterns']:
        # tokenize each word in the sentence
        w = nltk.word_tokenize(pattern)
        # add to our words list
        words.extend(w)
        # add to documents in our corpus
        documents.append((w, intent['tag']))
        # add to our classes list
        if intent['tag'] not in classes:
            classes.append(intent['tag'])

words = [stemmer.stem(w.lower()) for w in words if w not in ignore_words]
words = sorted(list(set(words)))

classes = sorted(list(set(classes)))

print (len(classes), "語境", classes)

print (len(words), "詞數", words)

輸出:

2 語境 ['告別', '打招呼']
14 詞數 ['88', '不好意思', '你好', '再見', '回頭見', '回見', '帥哥', '師傅', '您好', '拜拜', '有人嗎', '美女', '請問', '靚妹']

訓練不會根據詞匯來分析,因為詞匯對于機器來說是沒有任何意義的,這也是很多中文分詞庫所陷入的誤區,其實機器并不理解你輸入的到底是英文還是中文,我們只需要將單詞或者中文轉化為包含0/1的數組的詞袋。數組長度將等于詞匯量大小,當當前模式中的一個單詞或詞匯位于給定位置時,將設置為1。

# create our training data
training = []
# create an empty array for our output
output_empty = [0] * len(classes)
# training set, bag of words for each sentence
for doc in documents:
    # initialize our bag of words
    bag = []

    pattern_words = doc[0]
   
    pattern_words = [stemmer.stem(word.lower()) for word in pattern_words]

    for w in words:
        bag.append(1) if w in pattern_words else bag.append(0)
    
 
    output_row = list(output_empty)
    output_row[classes.index(doc[1])] = 1
    
    training.append([bag, output_row])

random.shuffle(training)
training = np.array(training)

train_x = list(training[:,0])
train_y = list(training[:,1])

我們開始進行數據訓練,模型是用Keras建立的,基于三層。由于數據基數小,分類輸出將是多類數組,這將有助于識別編碼意圖。使用softmax激活來產生多類分類輸出(結果返回一個0/1的數組:[1,0,0,...,0]--這個數組可以識別編碼意圖)。

model = Sequential()
model.add(Dense(128, input_shape=(len(train_x[0]),), activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(train_y[0]), activation='softmax'))


sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])


model.fit(np.array(train_x), np.array(train_y), epochs=200, batch_size=5, verbose=1)

這塊是以200次迭代的方式執行訓練,批處理量為5個,因為我的測試數據樣本小,所以100次也可以,這不是重點。

開始訓練:

14/14 [==============================] - 0s 32ms/step - loss: 0.7305 - acc: 0.5000
Epoch 2/200
14/14 [==============================] - 0s 391us/step - loss: 0.7458 - acc: 0.4286
Epoch 3/200
14/14 [==============================] - 0s 390us/step - loss: 0.7086 - acc: 0.3571
Epoch 4/200
14/14 [==============================] - 0s 395us/step - loss: 0.6941 - acc: 0.6429
Epoch 5/200
14/14 [==============================] - 0s 426us/step - loss: 0.6358 - acc: 0.7143
Epoch 6/200
14/14 [==============================] - 0s 356us/step - loss: 0.6287 - acc: 0.5714
Epoch 7/200
14/14 [==============================] - 0s 366us/step - loss: 0.6457 - acc: 0.6429
Epoch 8/200
14/14 [==============================] - 0s 899us/step - loss: 0.6336 - acc: 0.6429
Epoch 9/200
14/14 [==============================] - 0s 464us/step - loss: 0.5815 - acc: 0.6429
Epoch 10/200
14/14 [==============================] - 0s 408us/step - loss: 0.5895 - acc: 0.6429
Epoch 11/200
14/14 [==============================] - 0s 548us/step - loss: 0.6050 - acc: 0.6429
Epoch 12/200
14/14 [==============================] - 0s 468us/step - loss: 0.6254 - acc: 0.6429
Epoch 13/200
14/14 [==============================] - 0s 388us/step - loss: 0.4990 - acc: 0.7857
Epoch 14/200
14/14 [==============================] - 0s 392us/step - loss: 0.5880 - acc: 0.7143
Epoch 15/200
14/14 [==============================] - 0s 370us/step - loss: 0.5118 - acc: 0.8571
Epoch 16/200
14/14 [==============================] - 0s 457us/step - loss: 0.5579 - acc: 0.7143
Epoch 17/200
14/14 [==============================] - 0s 432us/step - loss: 0.4535 - acc: 0.7857
Epoch 18/200
14/14 [==============================] - 0s 357us/step - loss: 0.4367 - acc: 0.7857
Epoch 19/200
14/14 [==============================] - 0s 384us/step - loss: 0.4751 - acc: 0.7857
Epoch 20/200
14/14 [==============================] - 0s 346us/step - loss: 0.4404 - acc: 0.9286
Epoch 21/200
14/14 [==============================] - 0s 500us/step - loss: 0.4325 - acc: 0.8571
Epoch 22/200
14/14 [==============================] - 0s 400us/step - loss: 0.4104 - acc: 0.9286
Epoch 23/200
14/14 [==============================] - 0s 738us/step - loss: 0.4296 - acc: 0.7857
Epoch 24/200
14/14 [==============================] - 0s 387us/step - loss: 0.3706 - acc: 0.9286
Epoch 25/200
14/14 [==============================] - 0s 430us/step - loss: 0.4213 - acc: 0.8571
Epoch 26/200
14/14 [==============================] - 0s 351us/step - loss: 0.2867 - acc: 1.0000
Epoch 27/200
14/14 [==============================] - 0s 3ms/step - loss: 0.2903 - acc: 1.0000
Epoch 28/200
14/14 [==============================] - 0s 366us/step - loss: 0.3010 - acc: 0.9286
Epoch 29/200
14/14 [==============================] - 0s 404us/step - loss: 0.2466 - acc: 0.9286
Epoch 30/200
14/14 [==============================] - 0s 428us/step - loss: 0.3035 - acc: 0.7857
Epoch 31/200
14/14 [==============================] - 0s 407us/step - loss: 0.2075 - acc: 1.0000
Epoch 32/200
14/14 [==============================] - 0s 457us/step - loss: 0.2167 - acc: 0.9286
Epoch 33/200
14/14 [==============================] - 0s 613us/step - loss: 0.1266 - acc: 1.0000
Epoch 34/200
14/14 [==============================] - 0s 534us/step - loss: 0.2906 - acc: 0.9286
Epoch 35/200
14/14 [==============================] - 0s 463us/step - loss: 0.2560 - acc: 0.9286
Epoch 36/200
14/14 [==============================] - 0s 500us/step - loss: 0.1686 - acc: 1.0000
Epoch 37/200
14/14 [==============================] - 0s 387us/step - loss: 0.0922 - acc: 1.0000
Epoch 38/200
14/14 [==============================] - 0s 430us/step - loss: 0.1620 - acc: 1.0000
Epoch 39/200
14/14 [==============================] - 0s 371us/step - loss: 0.1104 - acc: 1.0000
Epoch 40/200
14/14 [==============================] - 0s 488us/step - loss: 0.1330 - acc: 1.0000
Epoch 41/200
14/14 [==============================] - 0s 381us/step - loss: 0.1322 - acc: 1.0000
Epoch 42/200
14/14 [==============================] - 0s 462us/step - loss: 0.0575 - acc: 1.0000
Epoch 43/200
14/14 [==============================] - 0s 1ms/step - loss: 0.1137 - acc: 1.0000
Epoch 44/200
14/14 [==============================] - 0s 450us/step - loss: 0.0245 - acc: 1.0000
Epoch 45/200
14/14 [==============================] - 0s 470us/step - loss: 0.1824 - acc: 1.0000
Epoch 46/200
14/14 [==============================] - 0s 444us/step - loss: 0.0822 - acc: 1.0000
Epoch 47/200
14/14 [==============================] - 0s 436us/step - loss: 0.0939 - acc: 1.0000
Epoch 48/200
14/14 [==============================] - 0s 396us/step - loss: 0.0288 - acc: 1.0000
Epoch 49/200
14/14 [==============================] - 0s 580us/step - loss: 0.1367 - acc: 0.9286
Epoch 50/200
14/14 [==============================] - 0s 351us/step - loss: 0.0363 - acc: 1.0000
Epoch 51/200
14/14 [==============================] - 0s 379us/step - loss: 0.0272 - acc: 1.0000
Epoch 52/200
14/14 [==============================] - 0s 358us/step - loss: 0.0712 - acc: 1.0000
Epoch 53/200
14/14 [==============================] - 0s 4ms/step - loss: 0.0426 - acc: 1.0000
Epoch 54/200
14/14 [==============================] - 0s 370us/step - loss: 0.0430 - acc: 1.0000
Epoch 55/200
14/14 [==============================] - 0s 368us/step - loss: 0.0292 - acc: 1.0000
Epoch 56/200
14/14 [==============================] - 0s 494us/step - loss: 0.0777 - acc: 1.0000
Epoch 57/200
14/14 [==============================] - 0s 356us/step - loss: 0.0496 - acc: 1.0000
Epoch 58/200
14/14 [==============================] - 0s 427us/step - loss: 0.1485 - acc: 1.0000
Epoch 59/200
14/14 [==============================] - 0s 381us/step - loss: 0.1006 - acc: 1.0000
Epoch 60/200
14/14 [==============================] - 0s 421us/step - loss: 0.0183 - acc: 1.0000
Epoch 61/200
14/14 [==============================] - 0s 344us/step - loss: 0.0788 - acc: 0.9286
Epoch 62/200
14/14 [==============================] - 0s 529us/step - loss: 0.0176 - acc: 1.0000

ok,200次之后,現在模型已經訓練好了,現在聲明一個方法用來進行詞袋轉換:

def clean_up_sentence(sentence):
    # tokenize the pattern - split words into array
    sentence_words = nltk.word_tokenize(sentence)
    # stem each word - create short form for word
    sentence_words = [stemmer.stem(word.lower()) for word in sentence_words]
    return sentence_words

def bow(sentence, words, show_details=True):
    # tokenize the pattern
    sentence_words = clean_up_sentence(sentence)
    # bag of words - matrix of N words, vocabulary matrix
    bag = [0]*len(words)  
    for s in sentence_words:
        for i,w in enumerate(words):
            if w == s: 
                # assign 1 if current word is in the vocabulary position
                bag[i] = 1
                if show_details:
                    print ("found in bag: %s" % w)
    return(np.array(bag))

測試一下,看看是否可以命中詞袋:

p = bow("你好", words)
print (p)

返回值:

found in bag: 你好
[0 0 1 0 0 0 0 0 0 0 0 0 0 0]

很明顯匹配成功,詞已入袋。

在我們打包模型之前,可以使用model.predict函數對用戶輸入進行分類測試,并根據計算出的概率返回用戶意圖(可以返回多個意圖,根據概率倒序輸出):

def classify_local(sentence):
    ERROR_THRESHOLD = 0.25
    
    # generate probabilities from the model
    input_data = pd.DataFrame([bow(sentence, words)], dtype=float, index=['input'])
    results = model.predict([input_data])[0]
    # filter out predictions below a threshold, and provide intent index
    results = [[i,r] for i,r in enumerate(results) if r>ERROR_THRESHOLD]
    # sort by strength of probability
    results.sort(key=lambda x: x[1], reverse=True)
    return_list = []
    for r in results:
        return_list.append((classes[r[0]], str(r[1])))
    # return tuple of intent and probability
    
    return return_list

測試一下:

print(classify_local('您好'))

返回值:

found in bag: 您好
[('打招呼', '0.999913')]
liuyue:mytornado liuyue$

再測:

print(classify_local('88'))

返回值:

found in bag: 88
[('告別', '0.9995449')]

完美,匹配出打招呼的語境標簽,如果愿意,可以多測試幾個,完善模型。

測試完成之后,我們可以將訓練好的模型打包,這樣每次調用之前就不用訓練了:

model.save("./v3u.h5")

這里分類模型會在根目錄產出,文件名為v3u.h5,將它保存好,一會兒會用到。

接下來,我們來搭建一個聊天機器人的API,這里我們使用目前非常火的框架Fastapi,將模型文件放入到項目的目錄之后,編寫main.py:

import random
import uvicorn
from fastapi import FastAPI
app = FastAPI()


def classify_local(sentence):
    ERROR_THRESHOLD = 0.25
    
    # generate probabilities from the model
    input_data = pd.DataFrame([bow(sentence, words)], dtype=float, index=['input'])
    results = model.predict([input_data])[0]
    # filter out predictions below a threshold, and provide intent index
    results = [[i,r] for i,r in enumerate(results) if r>ERROR_THRESHOLD]
    # sort by strength of probability
    results.sort(key=lambda x: x[1], reverse=True)
    return_list = []
    for r in results:
        return_list.append((classes[r[0]], str(r[1])))
    # return tuple of intent and probability
    
    return return_list

@app.get('/')
async def root(word: str = None):
    
    from keras.models import model_from_json,load_model
    model = load_model("./v3u.h5")

    wordlist = classify_local(word)
    a = ""
    for intent in intents['intents']:
        if intent['tag'] == wordlist[0][0]:
            a = random.choice(intent['responses'])



    return {'message':a}

if __name__ == "__main__":
    uvicorn.run(app, host="127.0.0.1", port=8000)

這里的:

from keras.models import model_from_json,load_model
    model = load_model("./v3u.h5")

用來導入剛才訓練好的模型庫,隨后啟動服務:

uvicorn main:app --reload

效果是這樣的:

 

結語:毫無疑問,科技改變生活,聊天機器人可以讓我們沒有佳人相伴的情況下,也可以聽聞鶯啼燕語,相信不久的將來,笑語盈盈、衣香鬢影的“機械姬”亦能伴吾等于清風明月之下。

分享到:
標簽:Python
用戶無頭像

網友整理

注冊時間:

網站:5 個   小程序:0 個  文章:12 篇

  • 51998

    網站

  • 12

    小程序

  • 1030137

    文章

  • 747

    會員

趕快注冊賬號,推廣您的網站吧!
最新入駐小程序

數獨大挑戰2018-06-03

數獨一種數學游戲,玩家需要根據9

答題星2018-06-03

您可以通過答題星輕松地創建試卷

全階人生考試2018-06-03

各種考試題,題庫,初中,高中,大學四六

運動步數有氧達人2018-06-03

記錄運動步數,積累氧氣值。還可偷

每日養生app2018-06-03

每日養生,天天健康

體育訓練成績評定2018-06-03

通用課目體育訓練成績評定