Overview
This paper provided the conceptual unification that the graph neural network field needed. By showing that seemingly different architectures -- GCN, GraphSAGE, Gated Graph Neural Networks, Weisfeiler-Lehman kernels -- are all instances of the same message-update-readout pattern, it gave researchers a common language and framework for designing new GNN variants. The Message Passing Neural Network (MPNN) framework decomposes any GNN into three functions: a message function that computes information to send along edges, an update function that combines received messages with node state, and a readout function that aggregates node representations into a graph-level prediction.
On the practical side, MPNNs achieved state-of-the-art results on the QM9 molecular property benchmark, predicting 13 quantum chemical properties (energies, dipole moment, polarizability, etc.) from molecular graphs with accuracy approaching density functional theory (DFT) calculations. The paper catalyzed the molecular ML boom from 2017 onward and remains the standard reference for GNN formalism.
The MPNN framework also revealed what the essential design choices are: the message function (how neighbors communicate), the aggregation (how messages are combined -- must be permutation-invariant), and the readout (how node features become graph features). By making these choices explicit, the framework enabled systematic exploration of the GNN design space and provided the vocabulary still used today.
Key Contributions
- MPNN unifying framework: All GNNs decompose into (1) a message function m_t(h_v, h_w, e_vw), (2) an update function h_v^{t+1} = U_t(h_v^t, sum of messages), and (3) a readout function y = R({h_v^T}) that aggregates node representations into a graph-level prediction
- Edge network message function: Messages are computed as h_w * A(e_vw) where A is a learned edge-conditioned neural network, allowing bond type information to modulate how neighbor features are combined
- Set2Set readout: Replaces simple sum/mean aggregation with an attention-based readout (borrowed from pointer networks) that produces order-invariant graph-level representations with greater expressiveness
- Virtual graph elements: Introduces techniques like master nodes (connected to all atoms) to improve information flow across the molecular graph
- Systematic benchmark: Evaluates multiple MPNN variants on all 13 QM9 properties with consistent training protocol, establishing a standard molecular ML benchmark methodology
Architecture / Method
┌─────────────────────────────────────────────────────────────┐
│ Molecular Graph Input │
│ Nodes = atoms (features: atomic #, charge) │
│ Edges = bonds (features: bond type, conjugation) │
└────────────────────────┬────────────────────────────────────┘
│
▼ × T rounds (T=6)
┌─────────────────────────────────────────────────────────────┐
│ Message Passing Phase │
│ │
│ For each node v: │
│ ┌──────────┐ ┌──────────────────┐ ┌──────────────┐ │
│ │ Neighbor │───►│ Message function │───►│ Aggregate │ │
│ │ h_w, e_vw │ │ m(h_v, h_w, e_vw)│ │ M_v = Σ m │ │
│ └──────────┘ └──────────────────┘ └──────┬───────┘ │
│ │ │
│ ┌──────────────────────────────────────────────▼───────┐ │
│ │ Update: h_v^{t+1} = GRU(h_v^t, M_v) │ │
│ └──────────────────────────────────────────────────────┘ │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Readout Phase (Set2Set) │
│ LSTM + attention over {h_v^T} ──► graph-level embedding │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ MLP ──► Molecular Property Prediction │
│ (13 QM9 properties: energy, dipole, etc.) │
└─────────────────────────────────────────────────────────────┘
The MPNN framework defines a GNN as operating in two phases: message passing and readout.
Message passing phase (T rounds): At each round t, every node v receives messages from its neighbors N(v). The message function m_t(h_v^t, h_w^t, e_vw) computes a vector for each edge (v,w) based on the source node state, neighbor node state, and edge features (e.g., bond type). Messages are aggregated via summation: M_v^{t+1} = sum_{w in N(v)} m_t(h_v, h_w, e_vw). The update function combines the aggregated message with the current state: h_v^{t+1} = U_t(h_v^t, M_v^{t+1}). In the best-performing variant, U is a GRU cell.
The key design choices studied are: (a) the edge network variant where m_t(h_v, h_w, e_vw) = A(e_vw) * h_w with A being a neural network that maps edge features to a matrix, (b) the attention variant using dot-product scoring, and (c) the basic sum variant.
Readout phase: After T rounds of message passing, a readout function R computes a graph-level feature vector y = R({h_v^T | v in G}). The Set2Set readout uses an LSTM with attention over node features for K processing steps, producing an order-invariant representation that is more expressive than simple sum pooling.
For QM9, atoms are nodes with features (atomic number, charge, etc.) and bonds are edges with features (bond type, conjugation). T=6 message passing rounds are used. The graph-level representation is fed through an MLP to predict each of the 13 molecular properties.
Results
- Reimplementing GCN, GGNN, and other architectures within the MPNN framework and training them identically reveals that architectural differences (not training details) account for performance gaps
- The best MPNN variant (enn-s2s, combining edge network message function with Set2Set readout) achieves chemical accuracy on 11 out of 13 QM9 targets; an ensemble of the top five models (enn-s2s-ens5) further reduces errors across all targets
- The edge network message function outperforms simpler message functions by 15-30% on most QM9 targets, demonstrating the importance of edge-conditioned communication
- Set2Set readout outperforms sum/mean readout by 5-15% on properties that depend on global molecular structure, validating the attention-based aggregation
- Virtual master nodes improve performance on larger molecules where message passing alone cannot propagate information across the full graph in T steps
Limitations & Open Questions
- 3D atomic coordinates are not used as input features, preventing the model from distinguishing stereoisomers or leveraging spatial geometry (addressed by later work: SchNet, DimeNet, NequIP)
- The O(|E| * T) complexity per message passing step limits scalability to very large molecular graphs or materials science applications
- The WL-test expressiveness bound means MPNNs cannot distinguish certain non-isomorphic graphs, placing a theoretical ceiling on representational power (addressed by higher-order GNNs)
Connections
- Machine Learning -- foundational GNN framework
- Pointer Networks -- Set2Set readout borrowed from pointer mechanism
- A Simple Neural Network Module For Relational Reasoning -- complementary approach to relational reasoning
- Attention Is All You Need -- attention mechanism used in Set2Set readout