[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: [mizar] an extension of the mizar language for dealing with article metadata
Dear Jesse,
In Algol 60 a comment was a part of the language. The syntax was
comment TEXT ;
where TEXT was an arbitrary string of characters ( the semicolon itself
was not allowed)
We treat now comments as something that actually is not a part of the
language. I would rather
have "pure Mizar" with only these informations that are necessary for
the verifier.
We have now some features that should not be in pure Mizar and probably
should be replaced by pragmas:
they may be classified in the following categories:
i. temporary pragmas: '@proof' - to suspend verification
ii. library pragmas: 'canceled' - to keep numeration correct
iii. editorial pragmas: repeated 'begin' - to divide article into section.
iv. maybe some others
We tried - on our seminar - to find common solution, but the categories
are so different that probably
there is none.
Recently, after reading your letter, we discussed the problem again with
Czeslaw and it seems that for the metadata the proper solution is to
have an XML-envelope of the Mizar article. Maybe something like
<DOCUMENT>
"the place for metadata"
<ARTICLE>
"the Mizar article"
</ARTICLE>
</DOCUMENT>
(mind you, I know almost nothing on XML, so it might quite naive).
Then from such XML document we may extract the Mizar article and process
it in the usual way.
"the Mizar article" is probably not the exact citation: we have '<' as a
symbol in Mizar. Still extracting the Mizar article should not be too
complicated. Maybe we can be more precise and have the form
<DOCUMENT>
"the place for metadata"
<ARTICLE>
<ENVIRONMENT DECLARATION>
"directives"
</ENVIRONMENT DECLARATION>
<TEXT PROPER>
"the text proper"
</TEXT PROPER>
</ARTICLE>
</DOCUMENT>
Then, 'environ' and 'begin' (if necessary) may be added while extracting
the Mizar article. The directives and the text proper are written in
two different languages, anyway.
<DOCUMENT>
"the place for metadata"
<ARTICLE>
<ENVIRONMENT DECLARATION>
"directives"
</ENVIRONMENT DECLARATION>
<TEXT PROPER>
<SECTION>
"the text proper"
</SECTION>
...
<SECTION>
"the text proper"
</SECTION>
</TEXT PROPER>
</ARTICLE>
</DOCUMENT>
Then we may put in the <SECTION> the title and maybe even the summary of
the section. And the grammar of Mizar will be a little bit simpler: no
repetition of 'begin'.
Regards,
Andrzej
Jesse Alama wrote:
As it stands now, the title of an article, its author(s), and the name
of sections aren't actually a formal part of a mizar text: we
customarily include this information as comments
:: Introduction to XYZ Theory
::
:: by Joe the MIZAR Writer
environ
...
begin :: Preliminaries
...
begin :: Main Lemma
...
begin :: Main Theorem
...
I propose to put this information into the body of the article, to move
it from comments to non-comments. Perhaps we could introduce new
keywords like `title', `author', and `section'. Then, articles could
look like
title{Introduction to XYZ Theory}
author{Joe the MIZAR Writer}
environ
<stuff>
begin{Preliminaries}
definition ...
theorem ...
theorem ...
begin{Main Lemma}
definition ...
begin{Main Theorem}
theorem ...
This would make the sections and article title part of the text; as it
stands, these features are not represented in the mizar grammar (because
customarily they are tucked away within comments).
The primary motivation for this idea is to help improve Josef Urban's
semantic MML presentation. Currently he has to do some hacking to
extract metadata like article titles and authors; extracting section
information (currently unsupported) would be similarly complicated. The
problem is that the article metadata isn't represented in the XML
representation of an article; it has to be inferred from the original
mizar source.
Additionally, this new syntax might help to simplify processing of FM
articles: with this approach one wouldn't need the BibTeX fields
SECTION1, SECTION2, etc., because these could be inferred from the
article.
Thoughts?
Best,
Jesse