本文介紹了Java-PDFbox:為帶標簽的PDF中的線條和下劃線創建構件標簽的處理方法,對大家解決問題具有一定的參考價值,需要的朋友們下面隨著小編來一起學習吧!
問題描述
我正在從標記的pdf創建輔助功能PDF。它顯示";路徑對象未標記&q;錯誤。PDF有線條和帶下劃線的文本。因此,我正在嘗試為未添加標簽的行項目添加";artiture";標記。我可以從PDFGraphicsStreamEngine
獲得這些行。有人能幫我這個忙嗎?
PDF頁面 | PAC3錯誤 |
---|---|
推薦答案
您可以使用PdfContentStreamEditor
中的PdfContentStreamEditor
類根據需要編輯頁面內容流,方法如下:
PDDocument document = ...;
for (PDPage page : document.getDocumentCatalog().getPages()) {
PdfContentStreamEditor markEditor = new PdfContentStreamEditor(document, page) {
int markedContentDepth = 0;
@Override
public void beginMarkedContentSequence(COSName tag, COSDictionary properties) {
if (inArtifact) {
System.err.println("Structural error in content stream: Path not properly closed by path painting instruction.");
}
markedContentDepth++;
super.beginMarkedContentSequence(tag, properties);
}
@Override
public void endMarkedContentSequence() {
markedContentDepth--;
super.endMarkedContentSequence();
}
boolean inArtifact = false;
@Override
protected void write(ContentStreamWriter contentStreamWriter, Operator operator, List<COSBase> operands) throws IOException {
String operatorString = operator.getName();
boolean unmarked = markedContentDepth == 0;
boolean inArtifactBefore = inArtifact;
if (unmarked && (!inArtifactBefore) && PATH_CONSTRUCTION.contains(operatorString)) {
super.write(contentStreamWriter, Operator.getOperator("BMC"), Collections.singletonList(COSName.ARTIFACT));
inArtifact = true;
}
super.write(contentStreamWriter, operator, operands);
if (unmarked && inArtifactBefore && PATH_PAINTING.contains(operatorString)) {
super.write(contentStreamWriter, Operator.getOperator("EMC"), Collections.emptyList());
inArtifact = false;
}
}
final List<String> PATH_CONSTRUCTION = Arrays.asList("m", "l", "c", "v", "y", "h", "re");
final List<String> PATH_PAINTING = Arrays.asList("s", "S", "f", "F", "f*", "B", "B*", "b", "b*", "n");
};
markEditor.processPage(page);
}
document.save(...);
(EditMarkedContent測試testMarkUnmarkedPathsAsArtifactsTradeSimple1
)
beginMarkedContentSequence
和endMarkedContentSequence
覆蓋跟蹤當前標記的內容嵌套深度,特別是是否標記了當前內容。
對于尚未標記的指令,write
覆蓋會將未標記的路徑構建和繪制指令序列包含在/Artifact BMC ... EMC
中。
請注意,此代碼僅考慮頁面內容流中的內容,它不會下降為表單XObject、模式等。
此外,如果內容流有錯誤(例如,在沒有繪制的情況下構建路徑),此代碼可能會添加額外的錯誤(例如,不平衡的標記內容開始和結束)。
這篇關于Java-PDFbox:為帶標簽的PDF中的線條和下劃線創建構件標簽的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,