Monte Carlo Semantics: Robust Inference and Logical Pattern Processing with Natural Language Text. thesis submitted to the University of Cambridge in partial fulfillment of the degree requirements for the Doctor of Philosophy, July 2010.
This thesis develops several pieces of theory and computational techniques which can be deployed for the purpose of allowing a computer to analyze short pieces of text (e.g. "Socrates is a man and every man is mortal.") and, on the basis of such an analysis, to decide yes/no questions about the text ("Is Socrates mortal?"). More particularly, the problem is seen as a logical inferencing task. The computer must decide whether or not a logical consequence relation "therefore" holds between the two pieces of text. ("Socrates is a man and every man is mortal, therefore Socrates is mortal.")
This problem is a pervasive theme in logic and semantics but has also been subject over the last five years to a wave of renewed attention in computational linguistics sparked by the Recognizing Textual Entailment (RTE) challenge. A critical reevaluation of this line of work is presented here which demonstrate several problems concerning the empirical methodology used at RTE and the results derived from it. This thesis is thus more theorydriven, but nevertheless inspired by RTE in that it addresses problems raised by RTE which have not previously received sufficient attention from a theoretical viewpoint, such as the problem of robustness.
With this goal in mind, two of the results on Natural Language Reasoning (NLR) established here become particularly important: (1) Assuming the syllogism as a benchmark fragment of NLR, the model theory which underlies NLR is not necessarily a two-valued logic, but it can be the many-valued Åukasiewicz logic. (2) Despite the fact that the syllogism is a logical language of less expressive power than natural language as a whole, a good approximation to NLR can still be obtained by using the method outlined here for rewriting natural language text into syllogistic premises.
These two properties of NLR enable the approach to robust inference and logical pattern processing called Monte Carlo semantics, which, in turn, demonstrates that a single logically based theory can account for the semantic informativity of deep techniques using theorem proving and for the robustness of bag-of-words shallow inference.
submitted version [PDF]
A Proposal on Evaluation Measures for RTE. Proceedings of the 2009 Workshop on Applied Textual Inference (TextInfer). August 2009. Singapore.
McPIET at RTE-4: Robust Inference and Logical Pattern Processing Based on Integrated Deep and Shallow Semantics. Proceedings of the Text Analysis Conference (TAC '08). November 2008. Gaithersburg, MD.
Some Notes on the Economics and Evaluation of Automatic Retrieval and Filtering of Communication Goods. unpublished manuscript, November 2007. talk presented at the Software Competence Center Hagenberg, Hagenberg, September 2007.
A Comprehensive Bibliography of Linguistic Steganography. Edward J. Delp and Ping Wah Wong (eds.), Proceedings of the SPIE International Conference on Security, Steganography, and Watermarking of Multimedia Contents, volume 6505, January 2007.
Closed Domain Question Answering Using Fuzzy Semantics. thesis submitted to the University of Cambridge in partial fulfillment of the degree requirements for the Master of Philosophy, July 2006. talk at the Cambridge NLIP Group Friday Seminar, Cambridge, November 2006. talk at the 23rd Chaos Communication Congress, Berlin, December 2006.
(with U. Bodenhofer) Syntax-Driven Analysis of Context-Free Languages with Respect to Fuzzy Relational Semantics. Proceedings of the 15th IEEE International Conference on Fuzzy Systems, pages 9647-9654, Vancouver, July 2006. best session paper award. Technical Report 0601, Software Competence Center Hagenberg, July 2006. Technical Report UCAM-CL-TR-663, University of Cambridge, Computer Laboratory, March 2006.
(with S. Katzenbeisser) Content-Aware Steganography: About Lazy Prisoners and Narrow-Minded Wardens. Proceedings of the 8th Information Hiding Conference, volume 4437 of the Lecture Notes in Computer Science, Springer Verlag, 2007. Technical Report FKI-252-05, Technische Universität München, Institut für Informatik AI/Cognition Group, December 2005.
(with S. Katzenbeisser) Towards Human Interactive Proofs in the Text-Domain. Kan Zhang and Yuliang Zheng (eds.), Proceedings of the 7th Information Security Conference, volume 3225 of the Lecture Notes in Computer Science, pages 257-267, Springer Verlag, September 2004.
Towards Linguistic Steganography: A Systematic Investigation of Approaches, Systems, and Issues. final year thesis submitted to the University of Derby in partial fulfillment of the degree requirements for the Bachelor of Science, April 2004. talk at the 21st Chaos Communication Congress, Berlin, December 2004.
Some Experimental Results on Feed-Forward Networks for Text Classification. coursework submitted to the University of Derby in partial fulfillment of the degree requirements for the Bachelor of Science, May 2004.
Ethical Lessons Learned from Computer Science. ACM Crossroads, 10(3):23-28, February 2004. reprinted as ACM Crossroads, 14(4):17-21, June 2008.
A Summary of Traditional Approaches to Natural Language Processing. Diplomarbeit submitted to the Höhere Technische Bundeslehranstalt Leonding in partial fulfillment of the degree requirements for the Diplom HTL Ingenieur, May 2003.