From d0749fe4934a801a73eada00317a0f5950a166c4 Mon Sep 17 00:00:00 2001 From: Payel Mukhopadhyay Date: Mon, 22 Jun 2026 15:04:12 -0400 Subject: [PATCH 1/2] README edit --- README.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index feac32d..72146a9 100755 --- a/README.md +++ b/README.md @@ -78,13 +78,12 @@ internal resolution. The processor consists of blocks containing factorized spac ### Patch Jittering Walrus suppressed the growth of long-run instabilities through the use of *patch jittering*. Patch jittering involves randomly translating the reference frame (with padding for boundaries) -before each step. While the paper goes into more theoretical detail on why this works, the core idea is that the specific downsampling pattern leads to predictable accumulation +before each step. While the paper provides a more theoretical explanation of why this works, the core idea is that the specific downsampling pattern leads to predictable accumulation of error and that randomizing this process can help alleviate this pathology. ### Adaptive Compute -To handle varying compute budgets and problem complexities, we also employ [stride modulation](https://arxiv.org/pdf/2507.09264) to allow users to adjust -their downstream resolution. During pretraining, this was used to keep internal resolution fairly consistent (32/33 per dim in 2D, 16/17 in 3D). In this approach, the downsampling +To handle varying compute budgets and problem complexities, we also employ [stride modulation](https://arxiv.org/pdf/2507.09264) to allow users to adjust their downstream resolution. During pretraining, this was used to keep internal resolution fairly consistent (32/33 per dim in 2D, 16/17 in 3D). In this approach, the downsampling layers of the encoder/decoder will dynamically adjust their stride based on a target internal resolution. ### Efficient Training From 7f3b14d176135638ec96bd0d25c86ccd9106d422 Mon Sep 17 00:00:00 2001 From: Payel Mukhopadhyay Date: Mon, 22 Jun 2026 15:54:53 -0400 Subject: [PATCH 2/2] updated acknowledgements and wording tweaks --- README.md | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 72146a9..33f3cde 100755 --- a/README.md +++ b/README.md @@ -20,7 +20,7 @@ Walrus schematic -This repo is built for training and evaluating Walrus, a multi-domain foundation model for continuum dynamics trained primarily on fluid-like behaviors. +This repo is built for training and evaluating Walrus, a multi-domain foundation model for continuum dynamics trained primarily on fluid-like systems. Walrus was trained on 19 different physical scenarios spanning 63 physical variables in both 2 and 3D. Walrus utilizes new tools for adaptive computation and improved stability in order to achieve accurate long-term rollouts while co-adapting sampling and distribution to improve training throughput despite handling varying dimensions, resolutions, and aspect ratios. @@ -72,18 +72,19 @@ If you already have a local copy of the Well, you can skip straight to example 1 ### Architecture -Walrus uses and encoder-processor-decoder structure. Encoder/decoder are hMLPs/transposed hMLPs using stride modulation to dynamically adjust the +Walrus uses an encoder-processor-decoder structure. Encoder/decoder are hMLPs/transposed hMLPs using stride modulation to dynamically adjust the internal resolution. The processor consists of blocks containing factorized space and time attention. ### Patch Jittering Walrus suppressed the growth of long-run instabilities through the use of *patch jittering*. Patch jittering involves randomly translating the reference frame (with padding for boundaries) -before each step. While the paper provides a more theoretical explanation of why this works, the core idea is that the specific downsampling pattern leads to predictable accumulation +before each step. While the paper provides more theoretical insights on why this works, the core idea is that the specific downsampling pattern leads to predictable accumulation of error and that randomizing this process can help alleviate this pathology. ### Adaptive Compute -To handle varying compute budgets and problem complexities, we also employ [stride modulation](https://arxiv.org/pdf/2507.09264) to allow users to adjust their downstream resolution. During pretraining, this was used to keep internal resolution fairly consistent (32/33 per dim in 2D, 16/17 in 3D). In this approach, the downsampling +To handle varying compute budgets and problem complexities, we also employ [stride modulation](https://arxiv.org/pdf/2507.09264) to allow users to adjust +their downstream resolution. During pretraining, this was used to keep internal resolution fairly consistent (32/33 per dim in 2D, 16/17 in 3D). In this approach, the downsampling layers of the encoder/decoder will dynamically adjust their stride based on a target internal resolution. ### Efficient Training @@ -105,9 +106,7 @@ For other queries, please reach out to the corresponding author: mmccabe@flatiro ## Acknowledgements -Walrus is built by [Polymathic AI](https://polymathic-ai.org/) as part of our mission of advancing the frontier of AI for scientific application. Polymathic AI gratefully acknowledges funding from the Simons Foundation and Schmidt Sciences, LLC. This work was performed with compute from the Scientific Computing Core, a -division of the Flatiron Institute, a division of the Simons Foundation and from the National AI Research Resource Pilot, including support from NVIDIA -and NVIDIA’s DGX Cloud product which includes the NVIDIA AI Enterprise Software Platform. +Walrus is built by [Polymathic AI](https://polymathic-ai.org/) as part of our mission of advancing the frontier of AI for scientific applications. Polymathic AI gratefully acknowledges funding from the Simons Foundation and Schmidt Sciences, LLC. This work was supported in part by the AI2050 program at Schmidt Sciences (Grant G-25-70028). Payel Mukhopadhyay thanks the Infosys-Cambridge AI centre for support. This work was performed with compute from the Scientific Computing Core, a division of the Flatiron Institute, a division of the Simons Foundation and from the National AI Research Resource Pilot, including support from NVIDIA and NVIDIA’s DGX Cloud product which includes the NVIDIA AI Enterprise Software Platform. ## Citing Walrus