基于形容词删除策略的语言隐写与密钥共享

Adjective Deletion for Linguistic Steganography and Secret Sharing

概念

Adjective Deletion 【形容词删除】
Linguistic Steganography 【语言隐写术】隐写术就是将秘密信息隐藏到看上去普通的信息中进行传送。

Linguistic steganography is a form of covert communication in which information is embedded in a seemly innocent cover text so that the presence of the information is imperceptible to an outside observer (human or computer).

理想的 Linguistic Steganography满足两个基本要求：high imperceptibility（不易察觉） and high payload capacity（高信息承载容量）
Secret Sharing 【密钥共享】一种分发、保存、恢复秘密密钥的方法。

文章所作工作

验证删除形容词的可行性的两种方法：「checking the acceptability of adjective deletion in noun phrases.」
- Google n-gram corpus 【谷歌语料库】「check 删除一个形容词后的 context 的流利程度」
- SVM模型(使用n-gram counts和其他方法训练得到) 「classify 是否在 context 删除形容词」
证明删除形容词技术可以集成到一个存在的语言系统(an existing linguistic stegosyste)
提出一种新的基于形容词删除技术(adjective deletion)的密钥共享(secret sharing)方法

(t,n)-threshold scheme

论文中采用的 secret sharing方法是基于(2, 2)-threshold, 其中共享的必须是两个可比较的文本(two comparable texts)。通过形容词删除技术将【0s 和 1s 的加密位字符串(secret bitstring;)】嵌入到两个文本中，这两个文本可以组合起来，获得秘密位串。

Hence the proposed method is a novel combination of secret sharing and linguistic steganography.

一种密钥共享与语言隐写技术的新颖组合方法？！

Adjective Deletion

在不影响句子流利程度和语义的情况下，可以将一些形容词删除。在下面的例子中，删除 own 这个形容词后，句意并没有发生改变。

he spent only his own money.

he spent only his money.

一种极端情况 adjective-noun ：大致可以理解为正确的废话（正确但duck不必的形容）吧。

unfair prejudice

horrible crime

fragile glass

隐写术种的语言转换(Linguistic Transformations for Steganography)

如：词汇替换、短语意译、句子结构调整、语义转换等【PS：有种毕业论文降重的赶脚】

还有一种研究通过在翻译的文本中嵌入信息。在机器翻译算法中引入水印作为参数，对带有水印的译文进行概率识别。

【Watermarking the outputs of structured prediction with an application in statistical machine translation】

Another recent work proposedby Venugopal et al. (2011) introduces a watermark as a parameter in the machine translation algorithm and probabilistically identifies the watermarked translation.

隐写系统评估

可以从两个方面对系统进行评估：安全性(security level)和嵌入容量( embedding capacity)

security level： automatic evaluation and human evaluation.

automatic evaluation 大概就是使用机器翻译评价指标 BLEU 和 NIST。计算隐藏文本与原始文本之间的距离。

human evaluation 就是认为指定的一套评估标准(seven-point scale)。
embedding capacity

将嵌入的信息按每个语言单位(每个句子或每个单词)比特进行量化。

隐写系统的语言转换和编码方法，以及隐写文本的选择都会影响隐写系统的安全级别和有效负载能力。

句子压缩

句子压缩，文本简化和文本摘要通常涉及删除句子中不重要的词，以使文本更简洁。论文中指出，形容词删除可以用在句子压缩之前或之后。进一步简化句子。

The proposed adjective deletion methods can be applied before and/or after a sentence compression system. Deleting unnecessary adjectives before can help the system focus on other content of a sentence. Deleting unnecessary adjectives after can generate an even more concise sentence.

Deletable Adjective Classification

论文中，为了使一个形容词的删除是可以接受的，使用两个检查：语法性和自然性检查(grammaticality and naturalness checks)。

N-gram Count 方法

计算删除形容词前后文本的 N-gram 统计得分，通过设置一个阈值，来判断删除后的文本是否可接受。

Features for the SVM

支持向量机的特征有：

N-gram Counts
Lexical Association Measures【确定形容词和名词之间的关联程度。】
Noun and Adjective Entropy【名词和形容词熵】
Contextual α-Skew Divergence【上下文的倾斜散度？】

Secret Sharing Scheme

将一个密钥位串分成两个部分share₀和 share₁ 。若目标形容词在share₀ 中保留，则密钥值取0，若目标形容词在share₁中保留，则密钥值取1。

Share0 holds secret bits as 0s and Share1 holds secret bits as 1s

下面是一个密钥位串为 101 的例子：