MatchZoo

Facilitating the design, comparison and sharing of deep text matching models.

NTMC-Community/MatchZoo
starsStars 3303
forksForks 868
watchersWatchers 3303
current-versionCurrent version v2.2
total-releasesTotal releases 2
open_issues_countOpen issues 27
dateFirst release 2019-04-04
dateLatest release 2019-10-09
updateLast update 2020-12-23

Facilitating the design, comparison and sharing of deep text matching models.
MatchZoo 是一个通用的文本匹配工具包,它旨在方便大家快速的实现、比较、以及分享最新的深度文本匹配模型。

🔥News: MatchZoo-py (PyTorch version of MatchZoo) is ready now.

The goal of MatchZoo is to provide a high-quality codebase for deep text matching research, such as document retrieval, question answering, conversational response ranking, and paraphrase identification. With the unified data processing pipeline, simplified model configuration and automatic hyper-parameters tunning features equipped, MatchZoo is flexible and easy to use.

Tasks Text 1 Text 2 Objective
Paraphrase Indentification string 1 string 2 classification
Textual Entailment text hypothesis classification
Question Answer question answer classification/ranking
Conversation dialog response classification/ranking
Information Retrieval query document ranking

Get Started in 60 Seconds

To train a Deep Semantic Structured Model, import matchzoo and prepare input data.

import matchzoo as mz

train_pack = mz.datasets.wiki_qa.load_data('train', task='ranking')
valid_pack = mz.datasets.wiki_qa.load_data('dev', task='ranking')

Preprocess your input data in three lines of code, keep track parameters to be passed into the model.

preprocessor = mz.preprocessors.DSSMPreprocessor()
train_processed = preprocessor.fit_transform(train_pack)
valid_processed = preprocessor.transform(valid_pack)

Make use of MatchZoo customized loss functions and evaluation metrics:

ranking_task = mz.tasks.Ranking(loss=mz.losses.RankCrossEntropyLoss(num_neg=4))
ranking_task.metrics = [
    mz.metrics.NormalizedDiscountedCumulativeGain(k=3),
    mz.metrics.MeanAveragePrecision()
]

Initialize the model, fine-tune the hyper-parameters.

model = mz.models.DSSM()
model.params['input_shapes'] = preprocessor.context['input_shapes']
model.params['task'] = ranking_task
model.guess_and_fill_missing_params()
model.build()
model.compile()

Generate pair-wise training data on-the-fly, evaluate model performance using customized callbacks on validation data.

train_generator = mz.PairDataGenerator(train_processed, num_dup=1, num_neg=4, batch_size=64, shuffle=True)
valid_x, valid_y = valid_processed.unpack()
evaluate = mz.callbacks.EvaluateAllMetrics(model, x=valid_x, y=valid_y, batch_size=len(valid_x))
history = model.fit_generator(train_generator, epochs=20, callbacks=[evaluate], workers=5, use_multiprocessing=False)

References

Tutorials

English Documentation

中文文档

If you're interested in the cutting-edge research progress, please take a look at awaresome neural models for semantic match.

Install

MatchZoo is dependent on Tensorflow. Two ways to install MatchZoo:

Install MatchZoo from Pypi:

pip install matchzoo

Install MatchZoo from the Github source:

git clone https://github.com/NTMC-Community/MatchZoo.git
cd MatchZoo
python setup.py install

Models

  1. A Deep Relevance Matching Model for Ad-hoc Retrieval.

  2. Text Matching as Image Recognition

  3. Convolutional Neural Network Architectures for Matching Natural Language Sentences

  4. Learning Deep Structured Semantic Models for Web Search using Clickthrough Data

  5. Learning Semantic Representations Using Convolutional Neural Networks for Web Search

  6. Convolutional Neural Network Architectures for Matching Natural Language Sentences

  7. A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations

  8. aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model

  9. Learning to Match Using Local and Distributed Representations of Text for Web Search

  10. End-to-End Neural Ad-hoc Ranking with Kernel Pooling

  11. Convolutional neural networks for soft-matching n-grams in ad-hoc search

  12. models under development: BiMPM ....

Citation

If you use MatchZoo in your research, please use the following BibTex entry.

@inproceedings{Guo:2019:MLP:3331184.3331403,
 author = {Guo, Jiafeng and Fan, Yixing and Ji, Xiang and Cheng, Xueqi},
 title = {MatchZoo: A Learning, Practicing, and Developing System for Neural Text Matching},
 booktitle = {Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval},
 series = {SIGIR'19},
 year = {2019},
 isbn = {978-1-4503-6172-9},
 location = {Paris, France},
 pages = {1297--1300},
 numpages = {4},
 url = {http://doi.acm.org/10.1145/3331184.3331403},
 doi = {10.1145/3331184.3331403},
 acmid = {3331403},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {matchzoo, neural network, text matching},
} 

Development Team

​ ​ ​ ​


Fan Yixing

Core Dev
ASST PROF, ICT


Wang Bo

Core Dev
M.S. TU Delft


Wang Zeyi

Core Dev
B.S. UC Davis


Pang Liang

Core Dev
ASST PROF, ICT


Yang Liu

Core Dev
PhD. UMASS


Wang Qinghua

Documentation
B.S. Shandong Univ.


Wang Zizhen

Dev
M.S. UCAS


Su Lixin

Dev
PhD. UCAS


Yang Zhou

Dev
M.S. CQUT


Tian Junfeng

Dev
M.S. ECNU

Contribution

Please make sure to read the this awesome list!

Thank you to all the people who already contributed to MatchZoo!

Mike Kellogg

Project Organizers

  • Jiafeng Guo
    • Institute of Computing Technology, Chinese Academy of Sciences
    • Homepage
  • Yanyan Lan
    • Institute of Computing Technology, Chinese Academy of Sciences
    • Homepage
  • Xueqi Cheng
    • Institute of Computing Technology, Chinese Academy of Sciences
    • Homepage

License

Apache-2.0

Copyright (c) 2015-present, Yixing Fan (faneshion)