如何開發一個在線朗讀的功能—-科大訊飛語音合成實戰
— 很久沒寫技術部落格,心血來潮,準備繼續撿起。
起因
天天學習強國,不過強國APP的語音朗讀不錯,了解之後是科大訊飛支援的,於是開始擼碼。//www.xfyun.cn/doc/tts/online_tts/API.html
註冊為開發者,介面要求這些我就不贅述了,文檔裡面寫的清楚。當然具體實現是另外一回事。
聽了一下效果,怎麼說呢,免費的和特色的還是有很大的差別的,免費的是剛好讓你能忍的那個級別,特色的和真人差別不大。看了一下收費,分為兩部分,一部分是介面費用,一部分是特色發音人的費用。基於擼碼的習慣,一切先從免費開始。
詳情請看這裡://www.xfyun.cn/services/online_tts
開干
看了一圈沒有C#的demo,這就尷尬了,雖然是有文檔,但是大家都懂,好比微信公眾號的開發文檔,要變成實際的程式碼,看得見的應用那是要廢一番功夫的。找了一番之後,終於發現一個開源的項目剛發布沒多久,真是喜出望外就開幹了: //github.com/zuiyuewentian/XunFeiNETSDK
訊飛的這個介面是基於websock的,我們先用控制台程式做一個demo。C#其實自帶了websocket,不過這裡用的是WebSocketSharp,這個我覺得很好,System.Net.WebSockets.WebSocket 是基於非同步方法的,後面我會講到,而WebSocketSharp.WebSocket 是基於事件的,很符合前端的編程習慣。
websocket = new WebSocketSharp.WebSocket(reqUrl); websocket.OnMessage += Websocket_OnMessage; websocket.OnOpen += Websocket_OnOpen; websocket.Connect();
訊飛的伺服器收到我們的文字內容後,會以流的形勢把音頻傳回來,在我們的伺服器上把這種流轉成文件即可。
private static Stopwatch stopwatch; public static void Main(string[] args) { //text要合成的文字,pathUrl域名 stopwatch = new Stopwatch(); stopwatch.Start(); var xunFeiNetSdk = new XunFeiTTS(); xunFeiNetSdk.MessageUpdate_Event += XunFeiNetSdk_MessageUpdate_Event; xunFeiNetSdk.SendData("張家界荷花國際機場,北京大興機場,長沙黃花機場,邵陽武岡機場,所有航班全部復航!"); Console.Read(); } static byte[] data = new byte[0]; private static void XunFeiNetSdk_MessageUpdate_Event(TTS_Data_Model message, string error = null) { if (error != null) { Console.WriteLine(error); return; } try { //合成結束 if (message.status == 2) { Console.WriteLine("合成成功"); string voice = string.Format("{0}.wav", DateTime.Now.ToString("yyyyMMddHHmmssfff")); Console.WriteLine("正在保存..."+voice); data = data.Concat(message.audioStream).ToArray(); var mWavWriter = new WaveFileWriter(voice, new WaveFormat(16000, 1)); mWavWriter.Write(data, 0, data.Length); mWavWriter.Close(); mWavWriter.Dispose(); Console.WriteLine("保存成功..."); var sp = stopwatch.Elapsed; Console.WriteLine("用時" + sp); } else { data = data.Concat(message.audioStream).ToArray(); } } catch (Exception ex) { Console.WriteLine(ex.Message); } } }
文件的存儲用的是NAudio,XunFeiNETSDK裡面的程式碼我獨立出來。
(最近2個月航班太少了,工資驟減,原諒我說出我的內心話)
這樣就得到了語音了。聽一聽,還能接受。但是怎麼做到web頁面裡面呢?
改造成web應用
首先的思路是,前端把文字發過來,然後交給sdk去獲取音頻,得到文件的地址後返回給前端。所以最合適的方案還是前端也用websocket,因為發送消息和收到消息是分開的。那麼這又需要後端有一個websocket服務了
我又不想單獨去開一個websocket服務,那就可以將這個websocket做成api的形式,如下:


namespace HHOA.MVC5.Controllers.API { [RoutePrefix("api/msg")] public class MsgApiController : ApiController { private static List<WebSocket> _sockets = new List<WebSocket>(); private readonly XunFeiTTS _xunFei; private WebSocket currentSocket = null; public MsgApiController() { _xunFei = new XunFeiTTS(); _xunFei.MessageUpdate_Event += XunFeiNetSdk_MessageUpdate_Event; Logger.Info("啟動XunFeiTTS"); } private byte[] data = new byte[0]; private void XunFeiNetSdk_MessageUpdate_Event(TTS_Data_Model message, string error = null) { if (error != null) { Console.WriteLine(error); return; } WaveFileWriter mWavWriter=null; try { //合成結束 if (message.status == 2) { Logger.Info("合成成功"); var savePath = HostingEnvironment.MapPath("~/Files/Voice/"); string diff = DateTime.Now.ToString("yyyyMMddHHmmssfff"); string voice = string.Format("{0}.wav", diff); var filePath = savePath + voice; var di = new DirectoryInfo(savePath); if (!di.Exists) { di.Create(); } var webPath = "/Files/Voice/" + voice; Logger.Info("正在保存..." + filePath); data = data.Concat(message.audioStream).ToArray(); mWavWriter = new WaveFileWriter(filePath, new WaveFormat(16000, 1)); mWavWriter.Write(data, 0, data.Length); mWavWriter.Close(); mWavWriter.Dispose(); Logger.Info("保存成功..."); //將音頻地址發給前端 if (currentSocket != null && currentSocket.State == WebSocketState.Open) { var recvBytes = Encoding.UTF8.GetBytes("voice:" + webPath); var sendBuffer = new ArraySegment<byte>(recvBytes); currentSocket.SendAsync(sendBuffer, WebSocketMessageType.Text, true, CancellationToken.None); } } else { data = data.Concat(message.audioStream).ToArray(); } } catch (Exception ex) { if (mWavWriter != null) { mWavWriter.Dispose(); } Logger.Debug(ex.Message); } } [Route] [HttpGet] public HttpResponseMessage Connect() { HttpContext.Current.AcceptWebSocketRequest(ProcessRequest); //在伺服器端接受Web Socket請求,傳入的函數作為Web Socket的處理函數,待Web Socket建立後該函數會被調用,在該函數中可以對Web Socket進行消息收發 return Request.CreateResponse(HttpStatusCode.SwitchingProtocols); //構造同意切換至Web Socket的Response. } public async Task ProcessRequest(AspNetWebSocketContext context) { var socket = context.WebSocket;//傳入的context中有當前的web socket對象 _sockets.Add(socket);//此處將web socket對象加入一個靜態列表中 //進入一個無限循環,當web socket close是循環結束 while (true) { var buffer = new ArraySegment<byte>(new byte[1024]); var receivedResult = await socket.ReceiveAsync(buffer, CancellationToken.None);//對web socket進行非同步接收數據 if (receivedResult.MessageType == WebSocketMessageType.Close) { await socket.CloseAsync(WebSocketCloseStatus.Empty, string.Empty, CancellationToken.None);//如果client發起close請求,對client進行ack _sockets.Remove(socket); break; } if (socket.State == WebSocketState.Open) { //收到了消息 string recvMsg = Encoding.UTF8.GetString(buffer.Array, 0, receivedResult.Count); //將這個消息發送給xf Logger.Info("收到消息:"+recvMsg); _xunFei.SendData(recvMsg); var recvBytes = Encoding.UTF8.GetBytes(recvMsg); var sendBuffer = new ArraySegment<byte>(buffer.Array); currentSocket = socket; await socket.SendAsync(sendBuffer, WebSocketMessageType.Text, true, CancellationToken.None); } } } } }
View Code
var webSocket; var player = document.getElementById("player"); function sendSocketMsg() { var msg = $("#msg").val(); webSocket.send(msg); showMsg("發送消息:" + msg, "blue"); } openSocket(); function openSocket() { if (webSocket != null && typeof (webSocket) != "undefined") { closeSocket(); } webSocket = new WebSocket("ws://" + location.hostname + ":" + location.port + "/api/msg"); webSocket.onopen = function () { showMsg("連接建立"); } webSocket.onerror = function () { showMsg("發生異常"); } webSocket.onmessage = function (event) { showMsg("收到消息:" + event.data, "yellow"); if (event.data.indexOf("voice:") > -1) { var src = event.data.split("voice:")[1]; player.src = src; player.play(); } } webSocket.onclose = function () { showMsg("連接關閉"); } } function closeSocket() { if (webSocket != null && typeof (webSocket) != "undefined") { webSocket.close(); } } function showMsg(msg, type) { if (type === null || typeof (type) === "undefined") type = "gray"; $("#show").append("<span class='" + type + "'>" + msg + "</span><br>"); }
這樣就得到產品的雛形了。後續要考慮的是文字的長短、音頻播放器的展示效果,還能換一下播放的聲音等等,每次給你說一個功能,其實這個功能背後有太多細節了。
Console版源碼://download.csdn.net/download/stoneniqiu/12347028
Web版源碼://download.csdn.net/download/stoneniqiu/12347167
沒有積分的可以關注我的訂閱號,回復語音合成。