[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: [mizar] copy/paste detection in MML
Hi,
at
http://lipa.ms.mff.cuni.cz/~urban/mmlcpd/mmlcpd.4.87.985/cpd/00interarticle.cpd.html
are additional 823 inter-article copyings detected by CPD. For efficiency
reasons, CPD was only run on subsets of articles starting with the same
letter (copyings mostly occur inside one article series).
Josef
On Mon, 5 Nov 2007, Josef Urban wrote:
Hi,
at http://lipa.ms.mff.cuni.cz/~urban/mmlcpd/mmlcpd.4.87.985/cpd/ are results
of running the CPD (Copy/paste Detector, http://pmd.sourceforge.net/cpd.html)
on each MML article. About four thousands copied blocks were detected, use
http://lipa.ms.mff.cuni.cz/~urban/mmlcpd/mmlcpd.4.87.985/cpd/?C=S;O=D to sort
the articles by their amount of copying.
I'll probably also add info about inter-article copying later. The detection
could be also improved by writing a special Mizar parser for CPD, and using
normalized versions of articles (e.g. with normalized identifier names -
could be done by simple postprocessing of the XML representation).
I hope this info will be used to gradually get rid of the worst copyings. I
also suggest to use tools like CPD as a part of the reviewing process for MML
articles.
Best,
Josef Urban