[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [mizar] copy/paste detection in MML




Hi,

at http://lipa.ms.mff.cuni.cz/~urban/mmlcpd/mmlcpd.4.87.985/cpd/00interarticle.cpd.html are additional 823 inter-article copyings detected by CPD. For efficiency reasons, CPD was only run on subsets of articles starting with the same letter (copyings mostly occur inside one article series).

Josef


On Mon, 5 Nov 2007, Josef Urban wrote:


Hi,

at http://lipa.ms.mff.cuni.cz/~urban/mmlcpd/mmlcpd.4.87.985/cpd/ are results of running the CPD (Copy/paste Detector, http://pmd.sourceforge.net/cpd.html) on each MML article. About four thousands copied blocks were detected, use http://lipa.ms.mff.cuni.cz/~urban/mmlcpd/mmlcpd.4.87.985/cpd/?C=S;O=D to sort the articles by their amount of copying.

I'll probably also add info about inter-article copying later. The detection could be also improved by writing a special Mizar parser for CPD, and using normalized versions of articles (e.g. with normalized identifier names - could be done by simple postprocessing of the XML representation).

I hope this info will be used to gradually get rid of the worst copyings. I also suggest to use tools like CPD as a part of the reviewing process for MML articles.

Best,
Josef Urban