#M1010. Sources

Sources

1. 文件特征提取

平均行宽

di=wimid_i = \frac{w_i}{m_i}

核密度估计权重

$$\rho_i = \frac{1}{\sqrt{2\pi}\hat{\sigma}} \sum_{j=1}^n \exp\left(-\frac{(d_i-d_j)^2}{2\hat{\sigma}^2}\right),\quad \hat{\sigma} = \frac{1}{n-1}\sum_{i=1}^n (d_i - \bar{d})^2$$dˉ=1ni=1ndi\bar{d} = \frac{1}{n}\sum_{i=1}^n d_i

加权平均行宽

$$\bar{d}_\omega = \frac{\sum_{i=1}^n \rho_i d_i}{\sum_{i=1}^n \rho_i}$$

行宽标准差

$$\sigma_d = \sqrt{ \frac{1}{n} \sum_{i=1}^n (d_i - \bar{d}_\omega)^2 }$$

偏度与峰度

$$\gamma_d = \frac{ \frac{1}{n} \sum_{i=1}^n (d_i - \bar{d}_\omega)^3 }{ \sigma_d^3 }$$$$\kappa_d = \frac{ \frac{1}{n} \sum_{i=1}^n (d_i - \bar{d}_\omega)^4 }{ \sigma_d^4 } - 3$$

2. 稀疏度度量

分数阶稀疏度α=0.5\alpha=0.5

si=2miwi+1s_i = \frac{2m_i}{\sqrt{w_i + 1}}

工程稀疏度L3L_3范数)

$$S_p = \left( \frac{1}{n} \sum_{i=1}^n s_i^3 \right)^{1/3}$$

3. 一致性度量

协方差核矩阵

$$\Sigma_{ij} = \exp\left( -\frac{|d_i - d_j|^2}{2\tau^2} \right),\quad \tau = \frac{1}{n}\sum_{i=1}^n d_i$$

度矩阵

Dii=j=1nΣijD_{ii} = \sum_{j=1}^n \Sigma_{ij}

归一化图拉普拉斯

L=D1/2(DΣ)D1/2L = D^{-1/2}(D - \Sigma)D^{-1/2}

谱半径与最小非零特征值

$$\lambda_{\max} = \max\{|\lambda| : \lambda \in \sigma(L)\}$$$$\lambda_{\min} = \min\{|\lambda| : \lambda \in \sigma(L), \lambda \neq 0\}$$

风格一致性系数

$$C = \frac{\lambda_{\min}}{\lambda_{\max}+1} \cdot \frac{\bar{d}_\omega}{\sigma_d+1}$$

4. 全局风格系数

能量泛函

$$\mathcal{E}[k] = \int_0^\infty \left| S_p e^{-k t} - C (1-e^{-t}) \right|^2 dt$$

变分极小值解析解

$$k = \mathop{\arg\min}_{k \in \mathbb{R}^+} \mathcal{E}[k] = \frac{S_p}{C} \cdot \frac{\lambda_{\min}+\lambda_{\max}}{2}$$

5. 文件风格值

拉普拉斯特征分解

$$L\psi_j = \lambda_j \psi_j,\quad d_i = \sum_{j=1}^n a_j \psi_j(i)$$

Edgeworth展开修正

$$SC_i = 100 \cdot \frac{d_i}{\bar{d}_\omega} \cdot k \cdot \left(1 + \gamma_d \cdot \frac{d_i - \bar{d}_\omega}{\sigma_d} + \frac{\kappa_d}{2} \cdot \left( \frac{d_i - \bar{d}_\omega}{\sigma_d} \right)^2 \right)$$

6. 工程统计量

均值

SC=1ni=1nSCi\overline{SC} = \frac{1}{n}\sum_{i=1}^n SC_i

方差

$$\sigma_{SC}^2 = \frac{1}{n}\sum_{i=1}^n (SC_i - \overline{SC})^2$$

风格熵

$$\mathcal{H}_{SC} = -\sum_{i=1}^n \frac{|SC_i|}{\sum_{j=1}^n |SC_j|} \ln \frac{|SC_i|}{\sum_{j=1}^n |SC_j|}$$

7. 文件匹配度

热核

$$h_t(i,j) = \sum_{l=1}^n e^{-\lambda_l t} \psi_l(i) \psi_l(j),\quad t = \sigma_{SC}^2$$

匹配度

$$\text{Match}_i = \sum_{j=1}^n h_t(i,j) \cdot \frac{\exp\left( -\dfrac{(SC_j - \overline{SC})^2}{2\sigma_{SC}^2} \right)} {\sum_{l=1}^n \exp\left( -\dfrac{(SC_l - \overline{SC})^2}{2\sigma_{SC}^2} \right)}$$

8. 工程健康度

Bregman散度φ(x)=xlnx\varphi(x)=x\ln x

Bφ(x,y)=xlnxy(xy)B_\varphi(x,y) = x\ln\frac{x}{y} - (x-y)

视觉健康指数

$$H = 100 \cdot \exp\left( -B_\varphi\!\left( \frac{\bar{d}_\omega}{100},\; \frac{C}{k+1} \right) \right)$$

9. 好工程判定标准

黄金视觉区间

80    SC    12080 \;\leqslant\; \overline{SC} \;\leqslant\; 120