這次根據 Microsoft 的 ML.NET APIs 範例,實作了情緒分析的應用,附上實作流程及 Code 給大家做參考
首先,我們要先了解機器學習的工作流程,機器學習的工作流程大致如下:
- 了解問題
- 攝取數據
- 數據預處理和特徵工程
- 訓練和預測模型
- 評估模型
- 模型運行
關於環境需求:
- Visual Studio 2017 15.6或更高版本
- 安裝.NET Core
- C# 7.1(個人測試使用舊版C#會出現錯誤)
下載範例資料集:
當前置作業都完成後,我們就可以開始了
- 首先,我們先建立專案,並將專案取名為SentimentAnalysis
- 接下來在專案的Bin資料夾底下,新增Data資料夾,並將範例資料集解壓縮後,放入 Data 資料夾中
- 接下來要安裝NuGet,在搜尋列輸入Microsoft.ML,然後安裝它
實作主要會用到兩個cs檔 Program.cs 及
SentimentData.cs
using System;
using Microsoft.ML.Models;
using Microsoft.ML.Runtime;
using Microsoft.ML.Runtime.Api;
using Microsoft.ML.Trainers;
using Microsoft.ML.Transforms;
using System.Collections.Generic;
using System.Linq;
using Microsoft.ML;
namespace SentimentAnalysis
{
class Program
{
static void Main(string[] args)
{
const string _dataPath = @"..\..\data\sentiment labelled sentences\imdb_labelled.txt";
const string _testDataPath = @"..\..\data\sentiment labelled sentences\yelp_labelled.txt";
var model = TrainAndPredict(_dataPath, _testDataPath);
Evaluate(model, _testDataPath);
Console.Read();
}
public static PredictionModel TrainAndPredict(string _dataPath,string _testDataPath)
{
var pipeline = new LearningPipeline();
pipeline.Add(new TextLoader(_dataPath, useHeader: false, separator: "tab"));
pipeline.Add(new TextFeaturizer("Features", "SentimentText"));
pipeline.Add(new FastTreeBinaryClassifier() { NumLeaves = 150, NumTrees = 25, MinDocumentsInLeafs = 5 });
PredictionModel model =
pipeline.Train();
IEnumerable sentiments = new[]
{
new SentimentData
{
SentimentText = "Contoso's 11 is a wonderful experience",
Sentiment = 0
},
new SentimentData
{
SentimentText = "The acting in this movie is very bad",
Sentiment = 0
},
new SentimentData
{
SentimentText = "Joe versus the Volcano Coffee Company is a great film.",
Sentiment = 0
}
};
IEnumerable predictions = model.Predict(sentiments);
Console.WriteLine();
Console.WriteLine("Sentiment Predictions");
Console.WriteLine("---------------------");
var sentimentsAndPredictions = sentiments.Zip(predictions, (sentiment, prediction) => (sentiment, prediction));
foreach (var item in sentimentsAndPredictions)
{
Console.WriteLine($"Sentiment: {item.sentiment.SentimentText} | Prediction: {(item.prediction.Sentiment ? "Positive" : "Negative")}");
}
Console.WriteLine();
return model;
}
public static void Evaluate(PredictionModel model,string _testDataPath)
{
var testData = new TextLoader(_testDataPath, useHeader: false, separator: "tab");
var evaluator = new BinaryClassificationEvaluator();
BinaryClassificationMetrics metrics = evaluator.Evaluate(model, testData);
Console.WriteLine();
Console.WriteLine("PredictionModel quality metrics evaluation");
Console.WriteLine("------------------------------------------");
Console.WriteLine($"Accuracy: {metrics.Accuracy:P2}");
Console.WriteLine($"Auc: {metrics.Auc:P2}");
Console.WriteLine($"F1Score: {metrics.F1Score:P2}");
}
}
}
新增一個Class,取名為 SentimentData.cs
- SentimentData.cs Source Code
using Microsoft.ML.Runtime.Api;
public class SentimentData
{
[Column(ordinal: "0")]
public
string SentimentText;
[Column(ordinal: "1", name: "Label")]
public
float Sentiment;
}
public class SentimentPrediction
{
[ColumnName("PredictedLabel")]
public
bool Sentiment;
}
參考網址:
留言
張貼留言