[Date Prev][Date Next] [Chronological] [Thread] [Top]

[mizar] copy/paste detection in MML




Hi,

at http://lipa.ms.mff.cuni.cz/~urban/mmlcpd/mmlcpd.4.87.985/cpd/ are results of running the CPD (Copy/paste Detector, http://pmd.sourceforge.net/cpd.html) on each MML article. About four thousands copied blocks were detected, use http://lipa.ms.mff.cuni.cz/~urban/mmlcpd/mmlcpd.4.87.985/cpd/?C=S;O=D to sort the articles by their amount of copying.

I'll probably also add info about inter-article copying later. The detection could be also improved by writing a special Mizar parser for CPD, and using normalized versions of articles (e.g. with normalized identifier names - could be done by simple postprocessing of the XML representation).

I hope this info will be used to gradually get rid of the worst copyings. I also suggest to use tools like CPD as a part of the reviewing process for MML articles.

Best,
Josef Urban