Learning Semantic Knowledge from Wikipedia: Learning Concept Hierarchies from Document Categories

📆 6/1/2024 4:47 PM

United Kingdom News News

United Kingdom Latest News,United Kingdom Headlines

📆 6/1/2024 4:47 PM
📰 hackernoon

⏱ Reading Time:
474 sec. here
9 min. at publisher
📊 Quality Score:
News: 193%
Publisher: 51%

In this study, researchers exploit rich, naturally-occurring structures on Wikipedia for various NLP tasks.

Author: Mingda Chen. Table of Links Abstract Acknowledgements 1 INTRODUCTION 1.1 Overview 1.2 Contributions 2 BACKGROUND 2.1 Self-Supervised Language Pretraining 2.2 Naturally-Occurring Data Structures 2.3 Sentence Variational Autoencoder 2.4 Summary 3 IMPROVING SELF-SUPERVISION FOR LANGUAGE PRETRAINING 3.1 Improving Language Representation Learning via Sentence Ordering Prediction 3.2 Improving In-Context Few-Shot Learning via Self-Supervised Training 3.

gives the best performance averaging over 8 tasks for both BERT and RoBERTa. We perform an in-depth analysis of approaches to handling the Wikipedia category graph and the effects of pretraining with showing more significant gains for the task requiring higher-level conceptual knowledge; uses category pairs in which one is a hyponym of the other, it is more closely related to work in extracting hyponymhypernym pairs from text . Pavlick et al. automatically generate a large-scale phrase pair dataset with several relationships by training classifiers on a relatively small amount of human-annotated data. However, most of this prior work uses raw text or raw text combined with either annotated data or curated resources like WordNet.

The Multi-Genre Natural Language Inference dataset is a human-annotated multi-domain NLI dataset. to form input data pairs. We include this dataset for more finegrained evaluation. Since there is no standard development or testing set for this dataset, we randomly sample 60%/20%/20% as our train/dev/test sets. Break. Glockner et al. constructed a challenging NLI dataset called “Break” using external knowledge bases such as WordNet. Since sentence pairs in the dataset only differ by one or two words, similar to a pair of adversarial examples, it has broken many NLI systems.

. All three datasets are constructed from their corresponding parent-child relationship pairs. Neutral pairs are first randomly sampled from non-ancestor-descendant relationships and top ranked pairs according to cosine similarities of ELMo embeddings are kept. We also ensure these datasets are balanced among the three classes. Code and data are available at https://github.com/ZeweiChu/WikiNLI. 4.3.4 Experimental Results The results are summarized in Table 4.10.

can lead to much more substantial gains than the other two resources. Although BERT-large + . We note that BERT-large + Wikidata, and WordNet, we list the top 20 most frequent words in these three resources in Table 4.11. Interestingly, WordNet contains mostly abstract words, such as “unit”, “family”, and “person”, while Wikidata contains many domain-specific words, such as “protein” and “gene”. In contrast,

, RTE, and for one epoch as it leads to better performance on downstream tasks. The results are in Table 4.13. We observe that except for we considered so far have used categories as the lowest level of hierarchies. We are interested in whether adding Wikipedia page titles would bring in additional knowledge for inference tasks. We experiment with including Wikipedia page titles that belong to Wikipedia categories to

category hierarchies. Table 4.14 compares the results of adding page titles and pruning different numbers of layers. Adding page titles mostly gives relatively small improvements to the model performance on downstream tasks, which shows that the page title is not a useful addition to is also balanced among three relations , and we experiment with 100k training instances and 5k development instances. Table 4.15 are some examples from

does not help on the downstream tasks. It is worth noting that the differences between by treating the mentions and page title layer as the same level . This effectively gives models pretrained on this version of which seems to impair the model performance greatly. This also validates our claim that specific knowledge tends to be noisy and less likely to be helpful for downstream tasks. More interestingly, these variants seem to affect Break the most, which is in line with our previous finding that Break favors higher-level knowledge. While most of our findings with sentential context are negative, the

with 50k instances of Wikidata. Table 4.17 compares these two settings for pretraining. “company” is under the “business” category and debt is under the “finance” category. They are not directly related. In WordNet, due to the polysemy of “company”, “company” and “debt” are both hyponyms of “state”, and in Wikidata, they are both a subclass of “legal concept”. For the phrase pair “family”/“woman”, in

or categories are still quite rare. Evaluating on Adversarial NLI. Adversarial NLI is collected via an iterative human-and-model-in-the-loop procedure. on average, in which case our phrase-based resources or pretraining approach are not optimal choices. Future research may focus on finding better ways to incorporate sentential context into

to other languages. We mostly follow the same procedures as English as “m Wikidata, and WordNet, finetune and evaluate them on other languages using the same language-specific 3000 NLI pairs mentioned earlier. We note that when pretraining on m could be partly attributed to domain differences across languages. To measure the differences, we compile a list of the top 20 most frequent words in the Chinese m

benefits non-English languages more than training on the languagespecific m we pretrain BERT and RoBERTa on brings consistent improvements in a low resource NLI setting where there are limited amounts of training data, and the improvements plateau as the number of training instances increases; more

in other languages and benchmark several resources on XNLI , showing that we pretrain BERT and RoBERTa on SciTail. SciTail is created from science questions and the corresponding answer candidates, and premises from relevant web sentences retrieved from a large corpus . SciTail has two categories: entailment and neutral. Similar to

Break . Glockner et al. constructed a challenging NLI dataset called “Break” using external knowledge bases such as WordNet. Since sentence pairs in the dataset only differ by one or two words, similar to a pair of adversarial examples, it has broken many NLI systems. Break Due to the fact that Break does not have a training split, we use the aforementioned subsampled

, RTE, and for one epoch as it leads to better performance on downstream tasks. The results are in Table 4.13. We observe that except for we considered so far have used categories as the lowest level of hierarchies. We are interested in whether adding Wikipedia page titles would bring in additional knowledge for inference tasks. Wikipedia Pages, Mentions, and Layer Pruning. We experiment with including Wikipedia page titles that belong to Wikipedia categories to

category hierarchies. Table 4.14 compares the results of adding page titles and pruning different numbers of layers. Adding page titles mostly gives relatively small improvements to the model performance on downstream tasks, which shows that the page title is not a useful addition to links in the Wikipedia sentences. More specifically, for a sentence with a hyperlink , we form new sentences by replacing the text mention with the page title as well as the categories describing that page. We consider these two sentences forming candidate child-parent relationship pairs. An example is shown in Fig. 4.5.

, we observe that adding extra context to are always immediately after Wikipedia pages, limiting the exposure of higher-level categories. To look into the importance of those categories, we construct another version of Additionally, we try to add mentions to with 50k instances of WordNet, while in the other setting we combine 50k instances of

has a closely related term “foreign affair ministries” and “foreign minister” under the category “international relations”, whereas WordNet does not have these two, and Wikidata only has “foreign minister”. As another example, consider the phrase pair “company” and “debt”. In when finetuning on 2k, 3k, 5k, 10k, and 20k

narrows as the training data size increases. We hypothesize that the performance gap does not reduce as expected between 3k and 5k or 10k and 20k due in part to the imbalanced number of instances available for the categories. For example, even when using 20k training instances, some of the has a context length almost 3 times longer than

Wikipedia has different languages, which naturally motivates us to extend . We will refer to this version of into other languages using Google Translate. We benchmark these resources on XNLI in four languages: French , Arabic , Urdu , and Chinese . When reporting these results, we pretrain multilingual BERT on the corresponding resources, finetune it on 3000 instances of the training set, perform early stopping on the development set, and test it on the test set. We always use XNLI from the corresponding language.

and

We have summarized this news so that you can read it quickly. If you are interested in the news, you can read the full text here. Read more:

United Kingdom Latest News, United Kingdom Headlines

Similar News:You can also read news stories similar to this one that we have collected from other news sources.

Learning Semantic Knowledge from Wikipedia: Learning Entity Representations from HyperlinksIn this study, researchers exploit rich, naturally-occurring structures on Wikipedia for various NLP tasks.
Read more »

Leveraging Natural Supervision: Learning Semantic Knowledge from WikipediaIn this study, researchers exploit rich, naturally-occurring structures on Wikipedia for various NLP tasks.
Read more »

Machine Learning vs. Deep Learning: What's the Difference?Artificial intelligence technology is undergirded by two intertwined forms of automation.
Read more »

How to disable learning on the Nest Learning ThermostatYou'll want to disable learning on the Nest Learning Thermostat if you want to run your own heating and cooling schedule. Here's how it works.
Read more »

Gene Linked to Learning Difficulties Has Direct Impact on Learning and MemoryScience, Space and Technology News 2024
Read more »

The Power of Universal Semantic Layers: Insights from Cube Co-founder Artyom KeydunovWhat is a universal semantic layer, and how is it different from a semantic layer? Is there actual semantics involved? Who uses that, how, and what for?
Read more »