五月天青色头像情侣网名,国产亚洲av片在线观看18女人,黑人巨茎大战俄罗斯美女,扒下她的小内裤打屁股

歡迎光臨散文網(wǎng) 會(huì)員登陸 & 注冊

ChatGPT Moderation: AIGC怎樣與人類價(jià)值觀對齊?怎樣識(shí)別parsons語言?生態(tài)/破壞話語?

2023-07-22 17:02 作者:biggertree-Jing  | 我要投稿

ChatGPT API Moderation model:ChatGPT API 審查模型

為了保證人工智能與人類健康的價(jià)值觀對齊,ChatGPT構(gòu)建了一個(gè)審查模型(Moderation Model)。目的是用來識(shí)別色情、暴力、侮辱、粗俗等惡意言辭和指令。這一目標(biāo)似乎與英語教學(xué)中屏蔽parsnips語言(注: parsnips 是指有關(guān)politics,?alcohol,?religion,?sex,?narcotics,?-isms,?pork的敏感詞)、生態(tài)話語分析中辨別生態(tài)話語/破壞性話語、批評話語分析中識(shí)別意識(shí)形態(tài)(價(jià)值觀)的需求不謀而合。在語言教學(xué)應(yīng)用、話語研究、教學(xué)材料開發(fā)中都有很強(qiáng)的應(yīng)用潛力。故此,特轉(zhuǎn)發(fā)以下文章,希望給大家?guī)韼椭?/span>


Discover in this article what is the?ChatGPT API Moderation model, and what are the 7 categories used in it and how to call and interpret them.

ChatGPT API Moderation model

OpenAI API provides the possibility to classify any text to ensure it complies with their usage policies, using a binary classification.?This classification is integrated in their Moderation model that one can call using openai API in Python.

7 categories are used in the OpenAI model: Hate, Hate/Threatening, Self-harm, Sexual, Sexual/minors, Violence, Violence/graphic.

One can use them to filter any inappropriate content (comments in a website, inputs from clients in chatbot requests…).?

Source: OpenAI documentation – 7 categories in Moderation Model


OpenAI API Moderation method

The method to call to use the moderation classification is:?openai.Moderation.create?

The answer is a JSON object:?

In the JSON object, you have:?

  • model: The model currently used is called “text-moderation-004”.

  • results: in which you have:

    • True: if the input text does violate the given category

    • False: if does not

    • categories: For each of the 7 categories, you have a binary classification:

    • Category scores: for each category, a score is calculated. It’s not a probability. The lower the score, the better the content. The higher the score, the more it violates the above categories.

  • flagged: Which is the final classification of the input.

    • “false” if the input text does not violate OpenAI’s policies.

    • “true” if it does: If at least one category is true, this flag is set to true too.

Moderation API Call

Standard Call

The classification of the prompt “I love chocolate” is “false”, meaning it does not violate any of the above categories.

Here is the detailed output:

All scores are very low, thus the given categories are all “false”.

Call violation

The prompt given in the following request?is just for illustration. It is not a personal opinion.

The output is “true”, meaning there is a violation. This is because the input violates the first category “hate” with a score of 0.52, while the other categories are all showing very low scores.

Some variants

When the input is describing a personal belief, the classification is correct. However when it describes a global opinion, the model does not classify it as violating the policies.?

Here is an example, where the classification is false even if the input has a negative connotation :

Here is another variant, where a simple comma can change widely the score (the classification in both cases is “true”):

The score is about 0.66

Here the score is about 0.954 (with a simple comma):

Summary

In this article, you have learned how to use the ChatGPT API Moderation model, that you can put in place for your own project/website to avoid inputs or comments violating any common sense.

I hope you enjoy reading the article. Leave me a SanLian :-)?


本文英文部分轉(zhuǎn)載自:https://machinelearning-basics.com/chatgpt-api-moderation-model/?

.



ChatGPT Moderation: AIGC怎樣與人類價(jià)值觀對齊?怎樣識(shí)別parsons語言?生態(tài)/破壞話語?的評論 (共 條)

分享到微博請遵守國家法律
石景山区| 恭城| 宁化县| 汪清县| 泰和县| 定南县| 许昌县| 荆门市| 泗洪县| 遵义市| 武邑县| 定边县| 湖南省| 会同县| 秦皇岛市| 仁怀市| 修武县| 长宁区| 溆浦县| 香格里拉县| 雷波县| 库尔勒市| 吉安市| 栖霞市| 晋中市| 灵台县| 镶黄旗| 商水县| 东平县| 云和县| 宁夏| SHOW| 沙坪坝区| 遵义市| 昔阳县| 遵义县| 五家渠市| 洪湖市| 高雄县| 夏津县| 麦盖提县|