Archive for the 'Graphical Models' Category

Zhemin Zhu defends PhD thesis on Co-occurrence Rate Networks

Friday, October 16th, 2015, posted by Djoerd Hiemstra

Co-occurrence Rate Networks: Towards separate training for undirected graphical models

by Zhemin Zhu

Dependence is a universal phenomenon which can be observed everywhere. In machine learning, probabilistic graphical models (PGMs) represent dependence relations with graphs. PGMs find wide applications in natural language processing (NLP), speech processing, computer vision, biomedicine, information retrieval, etc. Many traditional models, such as hidden Markov models (HMMs), Kalman filters, can be put under the umbrella of PGMs. The central idea of PGMs is to decompose (factorize) a joint probability into a product of local factors. Learning, inference and storage can be conducted efficiently over the factorization representation.
Two major types of PGMs can be distinguished: (i) Bayesian networks (directed graphs), and (ii) Markov networks (undirected graphs). Bayesian networks represent directed dependence with directed edges. Local factors of Bayesian networks are conditional probabilities. Directed dependence, directed edges and conditional probabilities are all asymmetric notions. In contrast, Markov networks represent mutual dependence with undirected edges. Both of mutual dependence and undirected edges are symmetric notions. For general Markov networks, based on the Hammersley–Clifford theorem, local factors are positive functions over maximum cliques. These local factors are explained using intuitive notions like ‘compatibility’ or ‘affinity’. Specially, if a graph forms a clique tree, the joint probability can be reparameterized into a junction tree factorization.
In this thesis, we propose a novel framework motivated by the Minimum Shared Information Principle (MSIP): We try to find a factorization in which the information shared between factors is minimum. In other words, we try to make factors as independent as possible.
The benefit of doing this is that we can train factors separately without paying a lot of efforts to guarantee consistency between them. To achieve this goal, we develop a theoretical framework called co-occurrence rate networks (CRNs) to obtain such a factorization. Briefly, given a joint probability, the CRN factorization is obtained as follows. We first strip off singleton probabilities from the joint probability. The quantity left is called co-occurrence rate (CR). CR is a symmetric quantity which measures mutual dependence among variables involved. Then we further decompose the joint CR into smaller and indepen dent CRs. Finally, we obtain a CRN factorization whose factors consist of all singleton probabilities and CR factors. There exist two kinds of independencies between these factors: (i) a singleton probability is independent (Here independent means two factors do not share information.) of other singleton probabilities; (ii) a CR factor is independent of other CR factors conditioned by singleton probabilities. Based on a CRN factorization, we propose an efficient two-step separate training method: (i) in the first step, we train a separate model for each singleton probability; (ii) given singleton probabilities, we train a separate model for each CR factor. Experimental results on three important natural language processing tasks show that our separate training method is two orders of magnitude faster than conditional random fields, while achieving competitive quality (often better on the overall quality metric F1).
The second contribution of this thesis is applying PGMs to a real-world NLP application: open relation extraction (ORE). In open relation extraction, two entities in a sentence are given, and the goal is to automatically extract their relation expression. ORE is a core technique, especially in the age of big data, for transforming unstructured information into structured data. We propose our model SimpleIE for this task. The basic idea is to decompose an extraction pattern into a sequence of simplification operations (components). The benefit by doing this is that these components can be re-combined in a new way to generate new extraction patterns. Hence SimpleIE can represent and capture diverse extraction patterns. This model is essentially a sequence labeling model. Experimental results on three benchmark data sets show that SimpleIE boosts recall and F1 by at least 15% comparing with seven ORE systems.
As tangible outputs of this thesis, we contribute open source implementations of our research results as well as an annotated data set.

[download pdf]

Guest speakers at 12th SSR

Tuesday, October 6th, 2015, posted by Djoerd Hiemstra

We are proud to announce the 12th Seminar on Searching and Ranking, with guest presentations by Ingo Frommholz from the University of Bedfordshire, UK, and Tom Heskes from Radboud University Nijmegen, the Netherlands.
More information at: SSR 12.

Linear Co-occurrence Rate Networks for Sequence Labeling

Monday, August 4th, 2014, posted by Djoerd Hiemstra

by Zhemin Zhu, Djoerd Hiemstra, and Peter Apers

Sequence labeling has wide applications in natural language processing and speech processing. Popular sequence labeling models suffer from some known problems. Hidden Markov models (HMMs) are generative models and they cannot encode transition features; Conditional Markov models (CMMs) suffer from the label bias problem; And training of conditional random fields (CRFs) can be expensive. In this paper, we propose Linear Co-occurrence Rate Networks (L-CRNs) for sequence labeling which avoid the mentioned problems with existing models. The factors of L-CRNs can be locally normalized and trained separately, which leads to a simple and efficient training method. Experimental results on real-world natural language processing data sets show that L-CRNs reduce the training time by orders of magnitudes while achieve very competitive results to CRFs.

[download pdf]

The paper will be presented at the International Conference on Statistical Language and Speech Processing (SLSP) in Grenoble, France on October 14-16, 2014

Our C++ implementation of L-CRNs and the datasets used in this paper can be found on Github.

Comparison of Local and Global Undirected Graphical Models

Tuesday, July 29th, 2014, posted by Djoerd Hiemstra

by Zhemin Zhu, Djoerd Hiemstra, Peter Apers, and Andreas Wombacher

Conditional Random Fields (CRFs) are discriminative undirected models which are globally normalized. Global normalization preserves CRFs from the label bias problem (LBP) which most local models suffer from. Recently proposed co-occurrence rate networks (CRNs) are also discriminative undirected models. In contrast to CRFs, CRNs are locally normalized. It was established that CRNs are immune to the LBP although they are local models. In this paper, we further compare these two models. The connection between CRNs and Copula are built in continuous case. Also their strength and weakness are further evaluated statistically by experiments.

[download pdf]

The paper was presented at the 22nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN) in Bruges (Belgium) on 23-25 April 2014.

Empirical Co-occurrence Rate Networks For Sequence Labeling

Wednesday, January 15th, 2014, posted by Djoerd Hiemstra

by Zhemin Zhu, Djoerd Hiemstra, Peter Apers, and Andreas Wombacher.

Structured prediction has wide applications in many areas. Powerful and popular models for structured prediction have been developed. Despite the successes, they suffer from some known problems: (i) Hidden Markov models are generative models which suffer from the mismatch problem. Also it is difficult to incorporate overlapping, non-independent features into a hidden Markov model explicitly. (ii) Conditional Markov models suffer from the label bias problem. (iii) Conditional Random Fields (CRFs) overcome the label bias problem by global normalization. But the global normalization of CRFs can be expensive which prevents CRFs from applying to big data. In this paper, we propose the Empirical Co-occurrence Rate Networks (ECRNs) for sequence labeling. ECRNs are discriminative models, so ECRNs overcome the problems of HMMs. ECRNs are also immune to the label bias problem even though they are locally normalized. To make the estimation of ECRNs as fast as possible, we simply use the empirical distributions as the estimation of parameters. Experiments on two real-world NLP tasks show that ECRNs reduce the training time radically while obtain competitive accuracy to the state-of-the-art models.

Presented at International Conference on Machine Learning and Applications (ICMLA) in Miami, Florida

[download pdf]

Debasis Ganguly successfully defends PhD thesis on Topical Relevance Models

Wednesday, August 28th, 2013, posted by Djoerd Hiemstra

Today, Debasis Ganguly successfully defended his PhD thesis at Dublin City University.

Topical Relevance Models

by Debasis Ganguly

An inherent characteristic of information retrieval (IR) is that the query expressing a user’s information need is often multi-faceted, that is, it encapsulates more than one specific potential sub-information need. This multi-facetedness of queries manifests itself as a topic distribution in the retrieved set of documents, where each document can be considered as a mixture of topics, one or more of which may correspond to the sub-information needs expressed in the query. In some specific domains of IR, such as patent prior art search, where the queries are full patent articles and the objective is to (in)validate the claims contained therein, the queries themselves are multi-topical in addition to the retrieved set of documents. The overall objective of the research described in this thesis involves investigating techniques to recognize and exploit these multi-topical characteristic of the retrieved documents and the queries in IR and relevance feedback in IR.

SemEval’s Sentiment Analysis in Twitter

Friday, July 5th, 2013, posted by Djoerd Hiemstra

UT-DB: An Experimental Study on Sentiment Analysis in Twitter

Zhemin Zhu, Djoerd Hiemstra, Peter Apers, and Andreas Wombacher

This paper describes our system for participating SemEval 2013 Task 2-B: Sentiment Analysis in Twitter. Given a message, our system classifies whether the message is positive, negative or neutral sentiment. It uses a co-occurrence rate model. The training data are constrained to the data provided by the task organizers (No other tweet data are used). We consider 9 types of features and use a subset of them in our submitted system. To see the contribution of each type of features, we do experimental study on features by leaving one type of features out each time. Results suggest that unigrams are the most important features, bigrams and POS tags seem not helpful, and stopwords should be retained to achieve the best results. The overall results of our system are promising regarding the constrained features and data we use.

[download pdf]

Empirical Training for Conditional Random Fields

Tuesday, February 26th, 2013, posted by Djoerd Hiemstra

A Closed Form Maximum Likelihood Estimator Of Conditional Random Fields

by Zhemin Zhu, Djoerd Hiemstra, Peter Apers and Andreas Wombacher

Training Conditional Random Fields (CRFs) can be very slow for big data. In this paper, we present a new training method for CRFs called Empirical Training which is motivated by the concept of co-occurrence rate. We show that the standard training (unregularized) can have many maximum like-lihood estimations (MLEs). Empirical training has a unique closed form MLE which is also a MLE of the standard training. We are the first to identify the Test Time Problem of the standard training which may lead to low accuracy. Empirical training is immune to this problem. Empirical training is also unaffected by the label bias problem even it is locally normalized. All of these have been verified by experiments. Experiments also show that empirical training reduces the training time from weeks to seconds, and obtains competitive results to the standard and piecewise training on linear-chain CRFs, especially when data are insufficient.

[download pdf]

Conditional Random Fields on Steroids

Monday, December 3rd, 2012, posted by Djoerd Hiemstra

I have never been more excited about a paper that I contributed to! In this technical report Zhemin Zhu introduces a new theory for factorizing undirected graphical models, with astonishing results, reducing the training time for conditional random fields from weeks till seconds on a part-of-speech tagging task. Reducing the training time from weeks to seconds is like approaching the moon up to a distance of about 100 meter, or buying a Ferrari F12 for 10 cents!!

Separate Training for Conditional Random Fields Using Co-occurrence Rate Factorization

by Zhemin Zhu, Djoerd Hiemstra, Peter Apers, and Andreas Wombacher

The standard training method of Conditional Random Fields (CRFs) is very slow for large-scale applications. In this paper, we present separate training for undirected models based on the novel Co-occurrence Rate Factorization (CR-F). Separate training is a local training method. In contrast to piecewise training, separate training is exact. In contrast to MEMMs, separate training is unaffected by the label bias problem. Experiments show that separate training (i) is unaffected by the label bias problem; (ii) reduces the training time from weeks to seconds; and (iii) obtains competitive results to the standard and piecewise training on linear-chain CRFs.

[download pdf]