diff --git a/.gitignore b/.gitignore index 1735dae..2ae054d 100644 --- a/.gitignore +++ b/.gitignore @@ -128,3 +128,5 @@ Data/Pretrained Data/utils.py Experiment/checkpoint Experiment/log + +*.ckpt \ No newline at end of file diff --git a/ckpts/LICENSE b/ckpts/LICENSE new file mode 100644 index 0000000..5522eea --- /dev/null +++ b/ckpts/LICENSE @@ -0,0 +1,439 @@ +Attribution-NonCommercial-ShareAlike 4.0 International + +Copyright (c) 2016-2025 HangZhou YuShu TECHNOLOGY CO.,LTD. ("Unitree Robotics") + +======================================================================= + +Creative Commons Corporation ("Creative Commons") is not a law firm and +does not provide legal services or legal advice. Distribution of +Creative Commons public licenses does not create a lawyer-client or +other relationship. Creative Commons makes its licenses and related +information available on an "as-is" basis. Creative Commons gives no +warranties regarding its licenses, any material licensed under their +terms and conditions, or any related information. Creative Commons +disclaims all liability for damages resulting from their use to the +fullest extent possible. + +Using Creative Commons Public Licenses + +Creative Commons public licenses provide a standard set of terms and +conditions that creators and other rights holders may use to share +original works of authorship and other material subject to copyright +and certain other rights specified in the public license below. The +following considerations are for informational purposes only, are not +exhaustive, and do not form part of our licenses. + + Considerations for licensors: Our public licenses are + intended for use by those authorized to give the public + permission to use material in ways otherwise restricted by + copyright and certain other rights. Our licenses are + irrevocable. Licensors should read and understand the terms + and conditions of the license they choose before applying it. + Licensors should also secure all rights necessary before + applying our licenses so that the public can reuse the + material as expected. Licensors should clearly mark any + material not subject to the license. This includes other CC- + licensed material, or material used under an exception or + limitation to copyright. More considerations for licensors: + wiki.creativecommons.org/Considerations_for_licensors + + Considerations for the public: By using one of our public + licenses, a licensor grants the public permission to use the + licensed material under specified terms and conditions. If + the licensor's permission is not necessary for any reason--for + example, because of any applicable exception or limitation to + copyright--then that use is not regulated by the license. Our + licenses grant only permissions under copyright and certain + other rights that a licensor has authority to grant. Use of + the licensed material may still be restricted for other + reasons, including because others have copyright or other + rights in the material. A licensor may make special requests, + such as asking that all changes be marked or described. + Although not required by our licenses, you are encouraged to + respect those requests where reasonable. More considerations + for the public: + wiki.creativecommons.org/Considerations_for_licensees + +======================================================================= + +Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International +Public License + +By exercising the Licensed Rights (defined below), You accept and agree +to be bound by the terms and conditions of this Creative Commons +Attribution-NonCommercial-ShareAlike 4.0 International Public License +("Public License"). To the extent this Public License may be +interpreted as a contract, You are granted the Licensed Rights in +consideration of Your acceptance of these terms and conditions, and the +Licensor grants You such rights in consideration of benefits the +Licensor receives from making the Licensed Material available under +these terms and conditions. + + +Section 1 -- Definitions. + + a. Adapted Material means material subject to Copyright and Similar + Rights that is derived from or based upon the Licensed Material + and in which the Licensed Material is translated, altered, + arranged, transformed, or otherwise modified in a manner requiring + permission under the Copyright and Similar Rights held by the + Licensor. For purposes of this Public License, where the Licensed + Material is a musical work, performance, or sound recording, + Adapted Material is always produced where the Licensed Material is + synched in timed relation with a moving image. + + b. Adapter's License means the license You apply to Your Copyright + and Similar Rights in Your contributions to Adapted Material in + accordance with the terms and conditions of this Public License. + + c. BY-NC-SA Compatible License means a license listed at + creativecommons.org/compatiblelicenses, approved by Creative + Commons as essentially the equivalent of this Public License. + + d. Copyright and Similar Rights means copyright and/or similar rights + closely related to copyright including, without limitation, + performance, broadcast, sound recording, and Sui Generis Database + Rights, without regard to how the rights are labeled or + categorized. For purposes of this Public License, the rights + specified in Section 2(b)(1)-(2) are not Copyright and Similar + Rights. + + e. Effective Technological Measures means those measures that, in the + absence of proper authority, may not be circumvented under laws + fulfilling obligations under Article 11 of the WIPO Copyright + Treaty adopted on December 20, 1996, and/or similar international + agreements. + + f. Exceptions and Limitations means fair use, fair dealing, and/or + any other exception or limitation to Copyright and Similar Rights + that applies to Your use of the Licensed Material. + + g. License Elements means the license attributes listed in the name + of a Creative Commons Public License. The License Elements of this + Public License are Attribution, NonCommercial, and ShareAlike. + + h. Licensed Material means the artistic or literary work, database, + or other material to which the Licensor applied this Public + License. + + i. Licensed Rights means the rights granted to You subject to the + terms and conditions of this Public License, which are limited to + all Copyright and Similar Rights that apply to Your use of the + Licensed Material and that the Licensor has authority to license. + + j. Licensor means the individual(s) or entity(ies) granting rights + under this Public License. + + k. NonCommercial means not primarily intended for or directed towards + commercial advantage or monetary compensation. For purposes of + this Public License, the exchange of the Licensed Material for + other material subject to Copyright and Similar Rights by digital + file-sharing or similar means is NonCommercial provided there is + no payment of monetary compensation in connection with the + exchange. + + l. Share means to provide material to the public by any means or + process that requires permission under the Licensed Rights, such + as reproduction, public display, public performance, distribution, + dissemination, communication, or importation, and to make material + available to the public including in ways that members of the + public may access the material from a place and at a time + individually chosen by them. + + m. Sui Generis Database Rights means rights other than copyright + resulting from Directive 96/9/EC of the European Parliament and of + the Council of 11 March 1996 on the legal protection of databases, + as amended and/or succeeded, as well as other essentially + equivalent rights anywhere in the world. + + n. You means the individual or entity exercising the Licensed Rights + under this Public License. Your has a corresponding meaning. + + +Section 2 -- Scope. + + a. License grant. + + 1. Subject to the terms and conditions of this Public License, + the Licensor hereby grants You a worldwide, royalty-free, + non-sublicensable, non-exclusive, irrevocable license to + exercise the Licensed Rights in the Licensed Material to: + + a. reproduce and Share the Licensed Material, in whole or + in part, for NonCommercial purposes only; and + + b. produce, reproduce, and Share Adapted Material for + NonCommercial purposes only. + + 2. Exceptions and Limitations. For the avoidance of doubt, where + Exceptions and Limitations apply to Your use, this Public + License does not apply, and You do not need to comply with + its terms and conditions. + + 3. Term. The term of this Public License is specified in Section + 6(a). + + 4. Media and formats; technical modifications allowed. The + Licensor authorizes You to exercise the Licensed Rights in + all media and formats whether now known or hereafter created, + and to make technical modifications necessary to do so. The + Licensor waives and/or agrees not to assert any right or + authority to forbid You from making technical modifications + necessary to exercise the Licensed Rights, including + technical modifications necessary to circumvent Effective + Technological Measures. For purposes of this Public License, + simply making modifications authorized by this Section 2(a) + (4) never produces Adapted Material. + + 5. Downstream recipients. + + a. Offer from the Licensor -- Licensed Material. Every + recipient of the Licensed Material automatically + receives an offer from the Licensor to exercise the + Licensed Rights under the terms and conditions of this + Public License. + + b. Additional offer from the Licensor -- Adapted Material. + Every recipient of Adapted Material from You + automatically receives an offer from the Licensor to + exercise the Licensed Rights in the Adapted Material + under the conditions of the Adapter's License You apply. + + c. No downstream restrictions. You may not offer or impose + any additional or different terms or conditions on, or + apply any Effective Technological Measures to, the + Licensed Material if doing so restricts exercise of the + Licensed Rights by any recipient of the Licensed + Material. + + 6. No endorsement. Nothing in this Public License constitutes or + may be construed as permission to assert or imply that You + are, or that Your use of the Licensed Material is, connected + with, or sponsored, endorsed, or granted official status by, + the Licensor or others designated to receive attribution as + provided in Section 3(a)(1)(A)(i). + + b. Other rights. + + 1. Moral rights, such as the right of integrity, are not + licensed under this Public License, nor are publicity, + privacy, and/or other similar personality rights; however, to + the extent possible, the Licensor waives and/or agrees not to + assert any such rights held by the Licensor to the limited + extent necessary to allow You to exercise the Licensed + Rights, but not otherwise. + + 2. Patent and trademark rights are not licensed under this + Public License. + + 3. To the extent possible, the Licensor waives any right to + collect royalties from You for the exercise of the Licensed + Rights, whether directly or through a collecting society + under any voluntary or waivable statutory or compulsory + licensing scheme. In all other cases the Licensor expressly + reserves any right to collect such royalties, including when + the Licensed Material is used other than for NonCommercial + purposes. + + +Section 3 -- License Conditions. + +Your exercise of the Licensed Rights is expressly made subject to the +following conditions. + + a. Attribution. + + 1. If You Share the Licensed Material (including in modified + form), You must: + + a. retain the following if it is supplied by the Licensor + with the Licensed Material: + + i. identification of the creator(s) of the Licensed + Material and any others designated to receive + attribution, in any reasonable manner requested by + the Licensor (including by pseudonym if + designated); + + ii. a copyright notice; + + iii. a notice that refers to this Public License; + + iv. a notice that refers to the disclaimer of + warranties; + + v. a URI or hyperlink to the Licensed Material to the + extent reasonably practicable; + + b. indicate if You modified the Licensed Material and + retain an indication of any previous modifications; and + + c. indicate the Licensed Material is licensed under this + Public License, and include the text of, or the URI or + hyperlink to, this Public License. + + 2. You may satisfy the conditions in Section 3(a)(1) in any + reasonable manner based on the medium, means, and context in + which You Share the Licensed Material. For example, it may be + reasonable to satisfy the conditions by providing a URI or + hyperlink to a resource that includes the required + information. + 3. If requested by the Licensor, You must remove any of the + information required by Section 3(a)(1)(A) to the extent + reasonably practicable. + + b. ShareAlike. + + In addition to the conditions in Section 3(a), if You Share + Adapted Material You produce, the following conditions also apply. + + 1. The Adapter's License You apply must be a Creative Commons + license with the same License Elements, this version or + later, or a BY-NC-SA Compatible License. + + 2. You must include the text of, or the URI or hyperlink to, the + Adapter's License You apply. You may satisfy this condition + in any reasonable manner based on the medium, means, and + context in which You Share Adapted Material. + + 3. You may not offer or impose any additional or different terms + or conditions on, or apply any Effective Technological + Measures to, Adapted Material that restrict exercise of the + rights granted under the Adapter's License You apply. + + +Section 4 -- Sui Generis Database Rights. + +Where the Licensed Rights include Sui Generis Database Rights that +apply to Your use of the Licensed Material: + + a. for the avoidance of doubt, Section 2(a)(1) grants You the right + to extract, reuse, reproduce, and Share all or a substantial + portion of the contents of the database for NonCommercial purposes + only; + + b. if You include all or a substantial portion of the database + contents in a database in which You have Sui Generis Database + Rights, then the database in which You have Sui Generis Database + Rights (but not its individual contents) is Adapted Material, + including for purposes of Section 3(b); and + + c. You must comply with the conditions in Section 3(a) if You Share + all or a substantial portion of the contents of the database. + +For the avoidance of doubt, this Section 4 supplements and does not +replace Your obligations under this Public License where the Licensed +Rights include other Copyright and Similar Rights. + + +Section 5 -- Disclaimer of Warranties and Limitation of Liability. + + a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE + EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS + AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF + ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, + IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, + WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR + PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, + ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT + KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT + ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU. + + b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE + TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, + NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, + INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, + COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR + USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN + ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR + DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR + IN PART, THIS LIMITATION MAY NOT APPLY TO YOU. + + c. The disclaimer of warranties and limitation of liability provided + above shall be interpreted in a manner that, to the extent + possible, most closely approximates an absolute disclaimer and + waiver of all liability. + + +Section 6 -- Term and Termination. + + a. This Public License applies for the term of the Copyright and + Similar Rights licensed here. However, if You fail to comply with + this Public License, then Your rights under this Public License + terminate automatically. + + b. Where Your right to use the Licensed Material has terminated under + Section 6(a), it reinstates: + + 1. automatically as of the date the violation is cured, provided + it is cured within 30 days of Your discovery of the + violation; or + + 2. upon express reinstatement by the Licensor. + + For the avoidance of doubt, this Section 6(b) does not affect any + right the Licensor may have to seek remedies for Your violations + of this Public License. + + c. For the avoidance of doubt, the Licensor may also offer the + Licensed Material under separate terms or conditions or stop + distributing the Licensed Material at any time; however, doing so + will not terminate this Public License. + + d. Sections 1, 5, 6, 7, and 8 survive termination of this Public + License. + + +Section 7 -- Other Terms and Conditions. + + a. The Licensor shall not be bound by any additional or different + terms or conditions communicated by You unless expressly agreed. + + b. Any arrangements, understandings, or agreements regarding the + Licensed Material not stated herein are separate from and + independent of the terms and conditions of this Public License. + + +Section 8 -- Interpretation. + + a. For the avoidance of doubt, this Public License does not, and + shall not be interpreted to, reduce, limit, restrict, or impose + conditions on any use of the Licensed Material that could lawfully + be made without permission under this Public License. + + b. To the extent possible, if any provision of this Public License is + deemed unenforceable, it shall be automatically reformed to the + minimum extent necessary to make it enforceable. If the provision + cannot be reformed, it shall be severed from this Public License + without affecting the enforceability of the remaining terms and + conditions. + + c. No term or condition of this Public License will be waived and no + failure to comply consented to unless expressly agreed to by the + Licensor. + + d. Nothing in this Public License constitutes or may be interpreted + as a limitation upon, or waiver of, any privileges and immunities + that apply to the Licensor or You, including from the legal + processes of any jurisdiction or authority. + +======================================================================= + +Creative Commons is not a party to its public +licenses. Notwithstanding, Creative Commons may elect to apply one of +its public licenses to material it publishes and in those instances +will be considered the “Licensor.” The text of the Creative Commons +public licenses is dedicated to the public domain under the CC0 Public +Domain Dedication. Except for the limited purpose of indicating that +material is shared under a Creative Commons public license or as +otherwise permitted by the Creative Commons policies published at +creativecommons.org/policies, Creative Commons does not authorize the +use of the trademark "Creative Commons" or any other trademark or logo +of Creative Commons without its prior written consent including, +without limitation, in connection with any unauthorized modifications +to any of its public licenses or any other arrangements, +understandings, or agreements concerning use of licensed material. For +the avoidance of doubt, this paragraph does not form part of the +public licenses. + +Creative Commons may be contacted at creativecommons.org. diff --git a/ckpts/README.md b/ckpts/README.md new file mode 100644 index 0000000..038ff9b --- /dev/null +++ b/ckpts/README.md @@ -0,0 +1,38 @@ +--- +tags: +- robotics +--- + +# UnifoLM-WMA-0: A World-Model-Action (WMA) Framework under UnifoLM Family +
+ Project Page | + Code | + Dataset +
+
|
|
+|:---:|:---:|
+|
|
|
+
+**Note: the top-right window shows the world model’s prediction of future environmental changes.**
+
+## License
+The model is released under the CC BY-NC-SA 4.0 license as found in the [LICENSE](https://huggingface.co/unitreerobotics/UnifoLM-WMA-0/blob/main/LICENSE). You are responsible for ensuring that your use of Unitree AI Models complies with all applicable laws.
+
+## Model Architecture
+
+
+## Citation
+```
+@misc{unifolm-wma-0,
+ author = {Unitree},
+ title = {UnifoLM-WMA-0: A World-Model-Action (WMA) Framework under UnifoLM Family},
+ year = {2025},
+}
+```
\ No newline at end of file
diff --git a/ckpts/assets/real_cleanup_pencils.gif b/ckpts/assets/real_cleanup_pencils.gif
new file mode 100644
index 0000000..2d2cd04
Binary files /dev/null and b/ckpts/assets/real_cleanup_pencils.gif differ
diff --git a/ckpts/assets/real_dual_stackbox.gif b/ckpts/assets/real_dual_stackbox.gif
new file mode 100644
index 0000000..e0a884c
Binary files /dev/null and b/ckpts/assets/real_dual_stackbox.gif differ
diff --git a/ckpts/assets/real_g1_pack_camera.gif b/ckpts/assets/real_g1_pack_camera.gif
new file mode 100644
index 0000000..90bbf26
Binary files /dev/null and b/ckpts/assets/real_g1_pack_camera.gif differ
diff --git a/ckpts/assets/real_z1_stackbox.gif b/ckpts/assets/real_z1_stackbox.gif
new file mode 100644
index 0000000..d33a49d
Binary files /dev/null and b/ckpts/assets/real_z1_stackbox.gif differ
diff --git a/ckpts/assets/world_model_interaction.gif b/ckpts/assets/world_model_interaction.gif
new file mode 100644
index 0000000..0ec8534
Binary files /dev/null and b/ckpts/assets/world_model_interaction.gif differ
diff --git a/configs/inference/world_model_interaction.yaml b/configs/inference/world_model_interaction.yaml
index 970d029..da709e0 100644
--- a/configs/inference/world_model_interaction.yaml
+++ b/configs/inference/world_model_interaction.yaml
@@ -222,7 +222,7 @@ data:
test:
target: unifolm_wma.data.wma_data.WMAData
params:
- data_dir: '/path/to/unifolm-world-model-action/examples/world_model_interaction_prompts'
+ data_dir: '/mnt/ASC1637/unifolm-world-model-action/examples/world_model_interaction_prompts'
video_length: ${model.params.wma_config.params.temporal_length}
frame_stride: 2
load_raw_resolution: True
diff --git a/psnr_score_for_challenge.py b/psnr_score_for_challenge.py
new file mode 100644
index 0000000..6223db6
--- /dev/null
+++ b/psnr_score_for_challenge.py
@@ -0,0 +1,89 @@
+import os
+import glob
+import numpy as np
+import json
+from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter
+from tqdm import tqdm
+from moviepy.video.io.VideoFileClip import VideoFileClip
+import PIL.Image
+
+
+def calculate_psnr(img1, img2):
+ mse = np.mean((img1.astype(np.float64) - img2.astype(np.float64)) ** 2)
+ if mse == 0:
+ return float('inf')
+ max_pixel = 255.0
+ psnr = 20 * np.log10(max_pixel / np.sqrt(mse))
+ return psnr
+
+
+def process_video_psnr(gt_path, pred_path):
+ try:
+ clip_gt = VideoFileClip(gt_path)
+ clip_pred = VideoFileClip(pred_path)
+
+ fps = min(clip_gt.fps, clip_pred.fps)
+ duration = min(clip_gt.duration, clip_pred.duration)
+
+ time_points = np.arange(0, duration, 1.0 / fps)
+
+ video_psnrs = []
+
+ for t in time_points:
+ frame_gt = clip_gt.get_frame(t)
+ frame_pred = clip_pred.get_frame(t)
+
+ img_gt = PIL.Image.fromarray(frame_gt).resize((256, 256), PIL.Image.Resampling.BILINEAR)
+ img_pred = PIL.Image.fromarray(frame_pred).resize((256, 256), PIL.Image.Resampling.BILINEAR)
+
+ psnr = calculate_psnr(np.array(img_gt), np.array(img_pred))
+ video_psnrs.append(psnr)
+
+ clip_gt.close()
+ clip_pred.close()
+
+ return np.mean(video_psnrs) if video_psnrs else 0.0
+
+ except Exception as e:
+ print(f"Error processing {os.path.basename(gt_path)}: {e}")
+ return None
+
+
+def main():
+ parser = ArgumentParser(formatter_class=ArgumentDefaultsHelpFormatter)
+ parser.add_argument('--gt_video', type=str, required=True, help='path to reference videos')
+ parser.add_argument('--pred_video', type=str, required=True, help='path to pred videos')
+ parser.add_argument('--output_file', type=str, default=None, help='path to output file')
+ args = parser.parse_args()
+
+ if not os.path.exists(args.gt_video):
+ print(f"Error: GT video not found at {args.gt_video}")
+ return
+ if not os.path.exists(args.pred_video):
+ print(f"Error: Pred video not found at {args.pred_video}")
+ return
+
+ print(f"Comparing:\nRef: {args.gt_video}\nPred: {args.pred_video}")
+
+ v_psnr = process_video_psnr(args.gt_video, args.pred_video)
+
+ if v_psnr is not None:
+ print("-" * 30)
+ print(f"Video PSNR: {v_psnr:.4f} dB")
+ print("-" * 30)
+
+ if args.output_file:
+ result = {
+ "gt_video": args.gt_video,
+ "pred_video": args.pred_video,
+ "psnr": v_psnr
+ }
+ with open(args.output_file, 'w') as f:
+ json.dump(result, f, indent=4)
+ print(f"Result saved to {args.output_file}")
+ else:
+ print("Failed to calculate PSNR.")
+
+
+if __name__ == '__main__':
+ main()
diff --git a/pyproject.toml b/pyproject.toml
index e08d9e6..af6c3df 100755
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -19,13 +19,13 @@ dependencies = [
"pytorch-lightning==1.9.3",
"pyyaml==6.0",
"setuptools==65.6.3",
- "torch==2.3.1",
- "torchvision==0.18.1",
+ #"torch==2.3.1",
+ #"torchvision==0.18.1",
"tqdm==4.66.5",
"transformers==4.40.1",
"moviepy==1.0.3",
"av==12.3.0",
- "xformers==0.0.27",
+ #"xformers==0.0.27",
"gradio==4.39.0",
"timm==0.9.10",
"scikit-learn==1.5.1",
diff --git a/unitree_g1_pack_camera/case1/run_world_model_interaction.sh b/unitree_g1_pack_camera/case1/run_world_model_interaction.sh
new file mode 100644
index 0000000..e0e900f
--- /dev/null
+++ b/unitree_g1_pack_camera/case1/run_world_model_interaction.sh
@@ -0,0 +1,24 @@
+res_dir="unitree_g1_pack_camera/case1"
+dataset="unitree_g1_pack_camera"
+
+{
+ time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \
+ --seed 123 \
+ --ckpt_path ckpts/unifolm_wma_dual.ckpt \
+ --config configs/inference/world_model_interaction.yaml \
+ --savedir "${res_dir}/output" \
+ --bs 1 --height 320 --width 512 \
+ --unconditional_guidance_scale 1.0 \
+ --ddim_steps 50 \
+ --ddim_eta 1.0 \
+ --prompt_dir "unitree_g1_pack_camera/case1/world_model_interaction_prompts" \
+ --dataset ${dataset} \
+ --video_length 16 \
+ --frame_stride 6 \
+ --n_action_steps 16 \
+ --exe_steps 16 \
+ --n_iter 11 \
+ --timestep_spacing 'uniform_trailing' \
+ --guidance_rescale 0.7 \
+ --perframe_ae
+} 2>&1 | tee "${res_dir}/output.log"
diff --git a/unitree_g1_pack_camera/case1/world_model_interaction_prompts/images/unitree_g1_pack_camera/0.png b/unitree_g1_pack_camera/case1/world_model_interaction_prompts/images/unitree_g1_pack_camera/0.png
new file mode 100644
index 0000000..8008d7a
Binary files /dev/null and b/unitree_g1_pack_camera/case1/world_model_interaction_prompts/images/unitree_g1_pack_camera/0.png differ
diff --git a/unitree_g1_pack_camera/case1/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/0.h5 b/unitree_g1_pack_camera/case1/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/0.h5
new file mode 100644
index 0000000..a5bf1f7
Binary files /dev/null and b/unitree_g1_pack_camera/case1/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/0.h5 differ
diff --git a/unitree_g1_pack_camera/case1/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/meta_data/stats.safetensors b/unitree_g1_pack_camera/case1/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/meta_data/stats.safetensors
new file mode 100644
index 0000000..4bdf81f
Binary files /dev/null and b/unitree_g1_pack_camera/case1/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/meta_data/stats.safetensors differ
diff --git a/unitree_g1_pack_camera/case1/world_model_interaction_prompts/unitree_g1_pack_camera.csv b/unitree_g1_pack_camera/case1/world_model_interaction_prompts/unitree_g1_pack_camera.csv
new file mode 100644
index 0000000..2bdc1cd
--- /dev/null
+++ b/unitree_g1_pack_camera/case1/world_model_interaction_prompts/unitree_g1_pack_camera.csv
@@ -0,0 +1,2 @@
+videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps
+0,x,x,unitree_g1_pack_camera,mount camera,x,x,x,G1_Dex1,30
diff --git a/unitree_g1_pack_camera/case2/run_world_model_interaction.sh b/unitree_g1_pack_camera/case2/run_world_model_interaction.sh
new file mode 100644
index 0000000..36e613d
--- /dev/null
+++ b/unitree_g1_pack_camera/case2/run_world_model_interaction.sh
@@ -0,0 +1,24 @@
+res_dir="unitree_g1_pack_camera/case2"
+dataset="unitree_g1_pack_camera"
+
+{
+ time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \
+ --seed 123 \
+ --ckpt_path ckpts/unifolm_wma_dual.ckpt \
+ --config configs/inference/world_model_interaction.yaml \
+ --savedir "${res_dir}/output" \
+ --bs 1 --height 320 --width 512 \
+ --unconditional_guidance_scale 1.0 \
+ --ddim_steps 50 \
+ --ddim_eta 1.0 \
+ --prompt_dir "unitree_g1_pack_camera/case2/world_model_interaction_prompts" \
+ --dataset ${dataset} \
+ --video_length 16 \
+ --frame_stride 6 \
+ --n_action_steps 16 \
+ --exe_steps 16 \
+ --n_iter 11 \
+ --timestep_spacing 'uniform_trailing' \
+ --guidance_rescale 0.7 \
+ --perframe_ae
+} 2>&1 | tee "${res_dir}/output.log"
diff --git a/unitree_g1_pack_camera/case2/world_model_interaction_prompts/images/unitree_g1_pack_camera/50.png b/unitree_g1_pack_camera/case2/world_model_interaction_prompts/images/unitree_g1_pack_camera/50.png
new file mode 100644
index 0000000..83eebaf
Binary files /dev/null and b/unitree_g1_pack_camera/case2/world_model_interaction_prompts/images/unitree_g1_pack_camera/50.png differ
diff --git a/unitree_g1_pack_camera/case2/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/50.h5 b/unitree_g1_pack_camera/case2/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/50.h5
new file mode 100644
index 0000000..90e741b
Binary files /dev/null and b/unitree_g1_pack_camera/case2/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/50.h5 differ
diff --git a/unitree_g1_pack_camera/case2/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/meta_data/stats.safetensors b/unitree_g1_pack_camera/case2/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/meta_data/stats.safetensors
new file mode 100644
index 0000000..4bdf81f
Binary files /dev/null and b/unitree_g1_pack_camera/case2/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/meta_data/stats.safetensors differ
diff --git a/unitree_g1_pack_camera/case2/world_model_interaction_prompts/unitree_g1_pack_camera.csv b/unitree_g1_pack_camera/case2/world_model_interaction_prompts/unitree_g1_pack_camera.csv
new file mode 100644
index 0000000..35ead3a
--- /dev/null
+++ b/unitree_g1_pack_camera/case2/world_model_interaction_prompts/unitree_g1_pack_camera.csv
@@ -0,0 +1,2 @@
+videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps
+50,x,x,unitree_g1_pack_camera,mount camera,x,x,x,G1_Dex1,30
diff --git a/unitree_g1_pack_camera/case3/run_world_model_interaction.sh b/unitree_g1_pack_camera/case3/run_world_model_interaction.sh
new file mode 100644
index 0000000..87e3098
--- /dev/null
+++ b/unitree_g1_pack_camera/case3/run_world_model_interaction.sh
@@ -0,0 +1,24 @@
+res_dir="unitree_g1_pack_camera/case3"
+dataset="unitree_g1_pack_camera"
+
+{
+ time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \
+ --seed 123 \
+ --ckpt_path ckpts/unifolm_wma_dual.ckpt \
+ --config configs/inference/world_model_interaction.yaml \
+ --savedir "${res_dir}/output" \
+ --bs 1 --height 320 --width 512 \
+ --unconditional_guidance_scale 1.0 \
+ --ddim_steps 50 \
+ --ddim_eta 1.0 \
+ --prompt_dir "unitree_g1_pack_camera/case3/world_model_interaction_prompts" \
+ --dataset ${dataset} \
+ --video_length 16 \
+ --frame_stride 6 \
+ --n_action_steps 16 \
+ --exe_steps 16 \
+ --n_iter 11 \
+ --timestep_spacing 'uniform_trailing' \
+ --guidance_rescale 0.7 \
+ --perframe_ae
+} 2>&1 | tee "${res_dir}/output.log"
diff --git a/unitree_g1_pack_camera/case3/world_model_interaction_prompts/images/unitree_g1_pack_camera/100.png b/unitree_g1_pack_camera/case3/world_model_interaction_prompts/images/unitree_g1_pack_camera/100.png
new file mode 100644
index 0000000..2f658f3
Binary files /dev/null and b/unitree_g1_pack_camera/case3/world_model_interaction_prompts/images/unitree_g1_pack_camera/100.png differ
diff --git a/unitree_g1_pack_camera/case3/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/100.h5 b/unitree_g1_pack_camera/case3/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/100.h5
new file mode 100644
index 0000000..f976464
Binary files /dev/null and b/unitree_g1_pack_camera/case3/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/100.h5 differ
diff --git a/unitree_g1_pack_camera/case3/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/meta_data/stats.safetensors b/unitree_g1_pack_camera/case3/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/meta_data/stats.safetensors
new file mode 100644
index 0000000..4bdf81f
Binary files /dev/null and b/unitree_g1_pack_camera/case3/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/meta_data/stats.safetensors differ
diff --git a/unitree_g1_pack_camera/case3/world_model_interaction_prompts/unitree_g1_pack_camera.csv b/unitree_g1_pack_camera/case3/world_model_interaction_prompts/unitree_g1_pack_camera.csv
new file mode 100644
index 0000000..c6350c9
--- /dev/null
+++ b/unitree_g1_pack_camera/case3/world_model_interaction_prompts/unitree_g1_pack_camera.csv
@@ -0,0 +1,2 @@
+videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps
+100,x,x,unitree_g1_pack_camera,mount camera,x,x,x,G1_Dex1,30
diff --git a/unitree_g1_pack_camera/case4/run_world_model_interaction.sh b/unitree_g1_pack_camera/case4/run_world_model_interaction.sh
new file mode 100644
index 0000000..46c5217
--- /dev/null
+++ b/unitree_g1_pack_camera/case4/run_world_model_interaction.sh
@@ -0,0 +1,24 @@
+res_dir="unitree_g1_pack_camera/case4"
+dataset="unitree_g1_pack_camera"
+
+{
+ time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \
+ --seed 123 \
+ --ckpt_path ckpts/unifolm_wma_dual.ckpt \
+ --config configs/inference/world_model_interaction.yaml \
+ --savedir "${res_dir}/output" \
+ --bs 1 --height 320 --width 512 \
+ --unconditional_guidance_scale 1.0 \
+ --ddim_steps 50 \
+ --ddim_eta 1.0 \
+ --prompt_dir "unitree_g1_pack_camera/case4/world_model_interaction_prompts" \
+ --dataset ${dataset} \
+ --video_length 16 \
+ --frame_stride 6 \
+ --n_action_steps 16 \
+ --exe_steps 16 \
+ --n_iter 11 \
+ --timestep_spacing 'uniform_trailing' \
+ --guidance_rescale 0.7 \
+ --perframe_ae
+} 2>&1 | tee "${res_dir}/output.log"
diff --git a/unitree_g1_pack_camera/case4/world_model_interaction_prompts/images/unitree_g1_pack_camera/200.png b/unitree_g1_pack_camera/case4/world_model_interaction_prompts/images/unitree_g1_pack_camera/200.png
new file mode 100644
index 0000000..3c718aa
Binary files /dev/null and b/unitree_g1_pack_camera/case4/world_model_interaction_prompts/images/unitree_g1_pack_camera/200.png differ
diff --git a/unitree_g1_pack_camera/case4/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/200.h5 b/unitree_g1_pack_camera/case4/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/200.h5
new file mode 100644
index 0000000..606c218
Binary files /dev/null and b/unitree_g1_pack_camera/case4/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/200.h5 differ
diff --git a/unitree_g1_pack_camera/case4/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/meta_data/stats.safetensors b/unitree_g1_pack_camera/case4/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/meta_data/stats.safetensors
new file mode 100644
index 0000000..4bdf81f
Binary files /dev/null and b/unitree_g1_pack_camera/case4/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/meta_data/stats.safetensors differ
diff --git a/unitree_g1_pack_camera/case4/world_model_interaction_prompts/unitree_g1_pack_camera.csv b/unitree_g1_pack_camera/case4/world_model_interaction_prompts/unitree_g1_pack_camera.csv
new file mode 100644
index 0000000..1fae9f0
--- /dev/null
+++ b/unitree_g1_pack_camera/case4/world_model_interaction_prompts/unitree_g1_pack_camera.csv
@@ -0,0 +1,2 @@
+videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps
+200,x,x,unitree_g1_pack_camera,mount camera,x,x,x,G1_Dex1,30
diff --git a/unitree_z1_dual_arm_cleanup_pencils/case1/run_world_model_interaction.sh b/unitree_z1_dual_arm_cleanup_pencils/case1/run_world_model_interaction.sh
new file mode 100644
index 0000000..8fe141f
--- /dev/null
+++ b/unitree_z1_dual_arm_cleanup_pencils/case1/run_world_model_interaction.sh
@@ -0,0 +1,24 @@
+res_dir="unitree_z1_dual_arm_cleanup_pencils/case1"
+dataset="unitree_z1_dual_arm_cleanup_pencils"
+
+{
+ time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \
+ --seed 123 \
+ --ckpt_path ckpts/unifolm_wma_dual.ckpt \
+ --config configs/inference/world_model_interaction.yaml \
+ --savedir "${res_dir}/output" \
+ --bs 1 --height 320 --width 512 \
+ --unconditional_guidance_scale 1.0 \
+ --ddim_steps 50 \
+ --ddim_eta 1.0 \
+ --prompt_dir "unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts" \
+ --dataset ${dataset} \
+ --video_length 16 \
+ --frame_stride 4 \
+ --n_action_steps 16 \
+ --exe_steps 16 \
+ --n_iter 8 \
+ --timestep_spacing 'uniform_trailing' \
+ --guidance_rescale 0.7 \
+ --perframe_ae
+} 2>&1 | tee "${res_dir}/output.log"
diff --git a/unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts/images/unitree_z1_dual_arm_cleanup_pencils/0.png b/unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts/images/unitree_z1_dual_arm_cleanup_pencils/0.png
new file mode 100644
index 0000000..2d8739d
Binary files /dev/null and b/unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts/images/unitree_z1_dual_arm_cleanup_pencils/0.png differ
diff --git a/unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/0.h5 b/unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/0.h5
new file mode 100644
index 0000000..6b120eb
Binary files /dev/null and b/unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/0.h5 differ
diff --git a/unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/meta_data/stats.safetensors b/unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/meta_data/stats.safetensors
new file mode 100644
index 0000000..e3194ab
Binary files /dev/null and b/unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/meta_data/stats.safetensors differ
diff --git a/unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts/unitree_z1_dual_arm_cleanup_pencils.csv b/unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts/unitree_z1_dual_arm_cleanup_pencils.csv
new file mode 100644
index 0000000..a749385
--- /dev/null
+++ b/unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts/unitree_z1_dual_arm_cleanup_pencils.csv
@@ -0,0 +1,2 @@
+videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps
+0,x,x,unitree_z1_dual_arm_cleanup_pencils,clean up eraser and pencils,x,x,x,Z1_Dual_Dex1,30
diff --git a/unitree_z1_dual_arm_cleanup_pencils/case2/run_world_model_interaction.sh b/unitree_z1_dual_arm_cleanup_pencils/case2/run_world_model_interaction.sh
new file mode 100644
index 0000000..2b84103
--- /dev/null
+++ b/unitree_z1_dual_arm_cleanup_pencils/case2/run_world_model_interaction.sh
@@ -0,0 +1,24 @@
+res_dir="unitree_z1_dual_arm_cleanup_pencils/case2"
+dataset="unitree_z1_dual_arm_cleanup_pencils"
+
+{
+ time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \
+ --seed 123 \
+ --ckpt_path ckpts/unifolm_wma_dual.ckpt \
+ --config configs/inference/world_model_interaction.yaml \
+ --savedir "${res_dir}/output" \
+ --bs 1 --height 320 --width 512 \
+ --unconditional_guidance_scale 1.0 \
+ --ddim_steps 50 \
+ --ddim_eta 1.0 \
+ --prompt_dir "unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts" \
+ --dataset ${dataset} \
+ --video_length 16 \
+ --frame_stride 4 \
+ --n_action_steps 16 \
+ --exe_steps 16 \
+ --n_iter 8 \
+ --timestep_spacing 'uniform_trailing' \
+ --guidance_rescale 0.7 \
+ --perframe_ae
+} 2>&1 | tee "${res_dir}/output.log"
diff --git a/unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts/images/unitree_z1_dual_arm_cleanup_pencils/50.png b/unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts/images/unitree_z1_dual_arm_cleanup_pencils/50.png
new file mode 100644
index 0000000..91725eb
Binary files /dev/null and b/unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts/images/unitree_z1_dual_arm_cleanup_pencils/50.png differ
diff --git a/unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/50.h5 b/unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/50.h5
new file mode 100644
index 0000000..6c08657
Binary files /dev/null and b/unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/50.h5 differ
diff --git a/unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/meta_data/stats.safetensors b/unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/meta_data/stats.safetensors
new file mode 100644
index 0000000..e3194ab
Binary files /dev/null and b/unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/meta_data/stats.safetensors differ
diff --git a/unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts/unitree_z1_dual_arm_cleanup_pencils.csv b/unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts/unitree_z1_dual_arm_cleanup_pencils.csv
new file mode 100644
index 0000000..a754862
--- /dev/null
+++ b/unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts/unitree_z1_dual_arm_cleanup_pencils.csv
@@ -0,0 +1,2 @@
+videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps
+50,x,x,unitree_z1_dual_arm_cleanup_pencils,clean up eraser and pencils,x,x,x,Z1_Dual_Dex1,30
diff --git a/unitree_z1_dual_arm_cleanup_pencils/case3/run_world_model_interaction.sh b/unitree_z1_dual_arm_cleanup_pencils/case3/run_world_model_interaction.sh
new file mode 100644
index 0000000..78c56d7
--- /dev/null
+++ b/unitree_z1_dual_arm_cleanup_pencils/case3/run_world_model_interaction.sh
@@ -0,0 +1,24 @@
+res_dir="unitree_z1_dual_arm_cleanup_pencils/case3"
+dataset="unitree_z1_dual_arm_cleanup_pencils"
+
+{
+ time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \
+ --seed 123 \
+ --ckpt_path ckpts/unifolm_wma_dual.ckpt \
+ --config configs/inference/world_model_interaction.yaml \
+ --savedir "${res_dir}/output" \
+ --bs 1 --height 320 --width 512 \
+ --unconditional_guidance_scale 1.0 \
+ --ddim_steps 50 \
+ --ddim_eta 1.0 \
+ --prompt_dir "unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts" \
+ --dataset ${dataset} \
+ --video_length 16 \
+ --frame_stride 4 \
+ --n_action_steps 16 \
+ --exe_steps 16 \
+ --n_iter 8 \
+ --timestep_spacing 'uniform_trailing' \
+ --guidance_rescale 0.7 \
+ --perframe_ae
+} 2>&1 | tee "${res_dir}/output.log"
diff --git a/unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts/images/unitree_z1_dual_arm_cleanup_pencils/100.png b/unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts/images/unitree_z1_dual_arm_cleanup_pencils/100.png
new file mode 100644
index 0000000..7cc656f
Binary files /dev/null and b/unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts/images/unitree_z1_dual_arm_cleanup_pencils/100.png differ
diff --git a/unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/100.h5 b/unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/100.h5
new file mode 100644
index 0000000..185d89b
Binary files /dev/null and b/unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/100.h5 differ
diff --git a/unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/meta_data/stats.safetensors b/unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/meta_data/stats.safetensors
new file mode 100644
index 0000000..e3194ab
Binary files /dev/null and b/unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/meta_data/stats.safetensors differ
diff --git a/unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts/unitree_z1_dual_arm_cleanup_pencils.csv b/unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts/unitree_z1_dual_arm_cleanup_pencils.csv
new file mode 100644
index 0000000..3462452
--- /dev/null
+++ b/unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts/unitree_z1_dual_arm_cleanup_pencils.csv
@@ -0,0 +1,2 @@
+videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps
+100,x,x,unitree_z1_dual_arm_cleanup_pencils,clean up eraser and pencils,x,x,x,Z1_Dual_Dex1,30
diff --git a/unitree_z1_dual_arm_cleanup_pencils/case4/run_world_model_interaction.sh b/unitree_z1_dual_arm_cleanup_pencils/case4/run_world_model_interaction.sh
new file mode 100644
index 0000000..9367c09
--- /dev/null
+++ b/unitree_z1_dual_arm_cleanup_pencils/case4/run_world_model_interaction.sh
@@ -0,0 +1,24 @@
+res_dir="unitree_z1_dual_arm_cleanup_pencils/case4"
+dataset="unitree_z1_dual_arm_cleanup_pencils"
+
+{
+ time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \
+ --seed 123 \
+ --ckpt_path ckpts/unifolm_wma_dual.ckpt \
+ --config configs/inference/world_model_interaction.yaml \
+ --savedir "${res_dir}/output" \
+ --bs 1 --height 320 --width 512 \
+ --unconditional_guidance_scale 1.0 \
+ --ddim_steps 50 \
+ --ddim_eta 1.0 \
+ --prompt_dir "unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts" \
+ --dataset ${dataset} \
+ --video_length 16 \
+ --frame_stride 4 \
+ --n_action_steps 16 \
+ --exe_steps 16 \
+ --n_iter 8 \
+ --timestep_spacing 'uniform_trailing' \
+ --guidance_rescale 0.7 \
+ --perframe_ae
+} 2>&1 | tee "${res_dir}/output.log"
diff --git a/unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts/images/unitree_z1_dual_arm_cleanup_pencils/200.png b/unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts/images/unitree_z1_dual_arm_cleanup_pencils/200.png
new file mode 100644
index 0000000..9934a16
Binary files /dev/null and b/unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts/images/unitree_z1_dual_arm_cleanup_pencils/200.png differ
diff --git a/unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/200.h5 b/unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/200.h5
new file mode 100644
index 0000000..97ccecc
Binary files /dev/null and b/unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/200.h5 differ
diff --git a/unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/meta_data/stats.safetensors b/unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/meta_data/stats.safetensors
new file mode 100644
index 0000000..e3194ab
Binary files /dev/null and b/unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/meta_data/stats.safetensors differ
diff --git a/unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts/unitree_z1_dual_arm_cleanup_pencils.csv b/unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts/unitree_z1_dual_arm_cleanup_pencils.csv
new file mode 100644
index 0000000..498d7f1
--- /dev/null
+++ b/unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts/unitree_z1_dual_arm_cleanup_pencils.csv
@@ -0,0 +1,2 @@
+videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps
+200,x,x,unitree_z1_dual_arm_cleanup_pencils,clean up eraser and pencils,x,x,x,Z1_Dual_Dex1,30
diff --git a/unitree_z1_dual_arm_stackbox/case1/run_world_model_interaction.sh b/unitree_z1_dual_arm_stackbox/case1/run_world_model_interaction.sh
new file mode 100644
index 0000000..0d9ed4c
--- /dev/null
+++ b/unitree_z1_dual_arm_stackbox/case1/run_world_model_interaction.sh
@@ -0,0 +1,24 @@
+res_dir="unitree_z1_dual_arm_stackbox/case1"
+dataset="unitree_z1_dual_arm_stackbox"
+
+{
+ time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \
+ --seed 123 \
+ --ckpt_path ckpts/unifolm_wma_dual.ckpt \
+ --config configs/inference/world_model_interaction.yaml \
+ --savedir "${res_dir}/output" \
+ --bs 1 --height 320 --width 512 \
+ --unconditional_guidance_scale 1.0 \
+ --ddim_steps 50 \
+ --ddim_eta 1.0 \
+ --prompt_dir "unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts" \
+ --dataset ${dataset} \
+ --video_length 16 \
+ --frame_stride 4 \
+ --n_action_steps 16 \
+ --exe_steps 16 \
+ --n_iter 7 \
+ --timestep_spacing 'uniform_trailing' \
+ --guidance_rescale 0.7 \
+ --perframe_ae
+} 2>&1 | tee "${res_dir}/output.log"
diff --git a/unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox/5.png b/unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox/5.png
new file mode 100644
index 0000000..eb6e272
Binary files /dev/null and b/unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox/5.png differ
diff --git a/unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/5.h5 b/unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/5.h5
new file mode 100644
index 0000000..af951c1
Binary files /dev/null and b/unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/5.h5 differ
diff --git a/unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/meta_data/stats.safetensors b/unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/meta_data/stats.safetensors
new file mode 100644
index 0000000..fa7fd40
Binary files /dev/null and b/unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/meta_data/stats.safetensors differ
diff --git a/unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox.csv b/unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox.csv
new file mode 100644
index 0000000..6e7f0a8
--- /dev/null
+++ b/unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox.csv
@@ -0,0 +1,2 @@
+videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps
+5,x,x,unitree_z1_dual_arm_stackbox,"Pick up the red cup on the table.",x,x,x,Unitree Z1 Robot Dual-Arm,30
diff --git a/unitree_z1_dual_arm_stackbox/case2/run_world_model_interaction.sh b/unitree_z1_dual_arm_stackbox/case2/run_world_model_interaction.sh
new file mode 100644
index 0000000..7b6d005
--- /dev/null
+++ b/unitree_z1_dual_arm_stackbox/case2/run_world_model_interaction.sh
@@ -0,0 +1,24 @@
+res_dir="unitree_z1_dual_arm_stackbox/case2"
+dataset="unitree_z1_dual_arm_stackbox"
+
+{
+ time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \
+ --seed 123 \
+ --ckpt_path ckpts/unifolm_wma_dual.ckpt \
+ --config configs/inference/world_model_interaction.yaml \
+ --savedir "${res_dir}/output" \
+ --bs 1 --height 320 --width 512 \
+ --unconditional_guidance_scale 1.0 \
+ --ddim_steps 50 \
+ --ddim_eta 1.0 \
+ --prompt_dir "unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts" \
+ --dataset ${dataset} \
+ --video_length 16 \
+ --frame_stride 4 \
+ --n_action_steps 16 \
+ --exe_steps 16 \
+ --n_iter 7 \
+ --timestep_spacing 'uniform_trailing' \
+ --guidance_rescale 0.7 \
+ --perframe_ae
+} 2>&1 | tee "${res_dir}/output.log"
diff --git a/unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox/15.png b/unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox/15.png
new file mode 100644
index 0000000..676341b
Binary files /dev/null and b/unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox/15.png differ
diff --git a/unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/15.h5 b/unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/15.h5
new file mode 100644
index 0000000..bf66fa5
Binary files /dev/null and b/unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/15.h5 differ
diff --git a/unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/meta_data/stats.safetensors b/unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/meta_data/stats.safetensors
new file mode 100644
index 0000000..fa7fd40
Binary files /dev/null and b/unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/meta_data/stats.safetensors differ
diff --git a/unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox.csv b/unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox.csv
new file mode 100644
index 0000000..79f4f8c
--- /dev/null
+++ b/unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox.csv
@@ -0,0 +1,2 @@
+videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps
+15,x,x,unitree_z1_dual_arm_stackbox,"Pick up the red cup on the table.",x,x,x,Unitree Z1 Robot Dual-Arm,30
diff --git a/unitree_z1_dual_arm_stackbox/case3/run_world_model_interaction.sh b/unitree_z1_dual_arm_stackbox/case3/run_world_model_interaction.sh
new file mode 100644
index 0000000..1058f25
--- /dev/null
+++ b/unitree_z1_dual_arm_stackbox/case3/run_world_model_interaction.sh
@@ -0,0 +1,24 @@
+res_dir="unitree_z1_dual_arm_stackbox/case3"
+dataset="unitree_z1_dual_arm_stackbox"
+
+{
+ time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \
+ --seed 123 \
+ --ckpt_path ckpts/unifolm_wma_dual.ckpt \
+ --config configs/inference/world_model_interaction.yaml \
+ --savedir "${res_dir}/output" \
+ --bs 1 --height 320 --width 512 \
+ --unconditional_guidance_scale 1.0 \
+ --ddim_steps 50 \
+ --ddim_eta 1.0 \
+ --prompt_dir "unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts" \
+ --dataset ${dataset} \
+ --video_length 16 \
+ --frame_stride 4 \
+ --n_action_steps 16 \
+ --exe_steps 16 \
+ --n_iter 7 \
+ --timestep_spacing 'uniform_trailing' \
+ --guidance_rescale 0.7 \
+ --perframe_ae
+} 2>&1 | tee "${res_dir}/output.log"
diff --git a/unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox/25.png b/unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox/25.png
new file mode 100644
index 0000000..5540f09
Binary files /dev/null and b/unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox/25.png differ
diff --git a/unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/25.h5 b/unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/25.h5
new file mode 100644
index 0000000..8a6ca42
Binary files /dev/null and b/unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/25.h5 differ
diff --git a/unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/meta_data/stats.safetensors b/unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/meta_data/stats.safetensors
new file mode 100644
index 0000000..fa7fd40
Binary files /dev/null and b/unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/meta_data/stats.safetensors differ
diff --git a/unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox.csv b/unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox.csv
new file mode 100644
index 0000000..3bbd2da
--- /dev/null
+++ b/unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox.csv
@@ -0,0 +1,2 @@
+videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps
+25,x,x,unitree_z1_dual_arm_stackbox,"Pick up the red cup on the table.",x,x,x,Unitree Z1 Robot Dual-Arm,30
diff --git a/unitree_z1_dual_arm_stackbox/case4/run_world_model_interaction.sh b/unitree_z1_dual_arm_stackbox/case4/run_world_model_interaction.sh
new file mode 100644
index 0000000..fa46100
--- /dev/null
+++ b/unitree_z1_dual_arm_stackbox/case4/run_world_model_interaction.sh
@@ -0,0 +1,24 @@
+res_dir="unitree_z1_dual_arm_stackbox/case4"
+dataset="unitree_z1_dual_arm_stackbox"
+
+{
+ time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \
+ --seed 123 \
+ --ckpt_path ckpts/unifolm_wma_dual.ckpt \
+ --config configs/inference/world_model_interaction.yaml \
+ --savedir "${res_dir}/output" \
+ --bs 1 --height 320 --width 512 \
+ --unconditional_guidance_scale 1.0 \
+ --ddim_steps 50 \
+ --ddim_eta 1.0 \
+ --prompt_dir "unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts" \
+ --dataset ${dataset} \
+ --video_length 16 \
+ --frame_stride 4 \
+ --n_action_steps 16 \
+ --exe_steps 16 \
+ --n_iter 7 \
+ --timestep_spacing 'uniform_trailing' \
+ --guidance_rescale 0.7 \
+ --perframe_ae
+} 2>&1 | tee "${res_dir}/output.log"
diff --git a/unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox/35.png b/unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox/35.png
new file mode 100644
index 0000000..f3ec0a3
Binary files /dev/null and b/unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox/35.png differ
diff --git a/unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/35.h5 b/unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/35.h5
new file mode 100644
index 0000000..875155b
Binary files /dev/null and b/unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/35.h5 differ
diff --git a/unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/meta_data/stats.safetensors b/unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/meta_data/stats.safetensors
new file mode 100644
index 0000000..fa7fd40
Binary files /dev/null and b/unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/meta_data/stats.safetensors differ
diff --git a/unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox.csv b/unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox.csv
new file mode 100644
index 0000000..f22144c
--- /dev/null
+++ b/unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox.csv
@@ -0,0 +1,2 @@
+videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps
+35,x,x,unitree_z1_dual_arm_stackbox,"Pick up the red cup on the table.",x,x,x,Unitree Z1 Robot Dual-Arm,30
diff --git a/unitree_z1_dual_arm_stackbox_v2/case1/run_world_model_interaction.sh b/unitree_z1_dual_arm_stackbox_v2/case1/run_world_model_interaction.sh
new file mode 100644
index 0000000..bdcbbff
--- /dev/null
+++ b/unitree_z1_dual_arm_stackbox_v2/case1/run_world_model_interaction.sh
@@ -0,0 +1,24 @@
+res_dir="unitree_z1_dual_arm_stackbox_v2/case1"
+dataset="unitree_z1_dual_arm_stackbox_v2"
+
+{
+ time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \
+ --seed 123 \
+ --ckpt_path ckpts/unifolm_wma_dual.ckpt \
+ --config configs/inference/world_model_interaction.yaml \
+ --savedir "${res_dir}/output" \
+ --bs 1 --height 320 --width 512 \
+ --unconditional_guidance_scale 1.0 \
+ --ddim_steps 50 \
+ --ddim_eta 1.0 \
+ --prompt_dir "unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts" \
+ --dataset ${dataset} \
+ --video_length 16 \
+ --frame_stride 4 \
+ --n_action_steps 16 \
+ --exe_steps 16 \
+ --n_iter 11 \
+ --timestep_spacing 'uniform_trailing' \
+ --guidance_rescale 0.7 \
+ --perframe_ae
+} 2>&1 | tee "${res_dir}/output.log"
diff --git a/unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox_v2/5.png b/unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox_v2/5.png
new file mode 100644
index 0000000..2371c4d
Binary files /dev/null and b/unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox_v2/5.png differ
diff --git a/unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/5.h5 b/unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/5.h5
new file mode 100644
index 0000000..a999fc7
Binary files /dev/null and b/unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/5.h5 differ
diff --git a/unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/meta_data/stats.safetensors b/unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/meta_data/stats.safetensors
new file mode 100644
index 0000000..6ef7a6c
Binary files /dev/null and b/unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/meta_data/stats.safetensors differ
diff --git a/unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox_v2.csv b/unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox_v2.csv
new file mode 100644
index 0000000..4591e75
--- /dev/null
+++ b/unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox_v2.csv
@@ -0,0 +1,2 @@
+videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps
+5,x,x,unitree_z1_dual_arm_stackbox_v2,"Stack the blocks in the rectangular block: red at the bottom, yellow in the middle, green on top",x,x,x,Unitree Z1 Robot Dual-Arm,30
diff --git a/unitree_z1_dual_arm_stackbox_v2/case2/run_world_model_interaction.sh b/unitree_z1_dual_arm_stackbox_v2/case2/run_world_model_interaction.sh
new file mode 100644
index 0000000..2c94946
--- /dev/null
+++ b/unitree_z1_dual_arm_stackbox_v2/case2/run_world_model_interaction.sh
@@ -0,0 +1,24 @@
+res_dir="unitree_z1_dual_arm_stackbox_v2/case2"
+dataset="unitree_z1_dual_arm_stackbox_v2"
+
+{
+ time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \
+ --seed 123 \
+ --ckpt_path ckpts/unifolm_wma_dual.ckpt \
+ --config configs/inference/world_model_interaction.yaml \
+ --savedir "${res_dir}/output" \
+ --bs 1 --height 320 --width 512 \
+ --unconditional_guidance_scale 1.0 \
+ --ddim_steps 50 \
+ --ddim_eta 1.0 \
+ --prompt_dir "unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts" \
+ --dataset ${dataset} \
+ --video_length 16 \
+ --frame_stride 4 \
+ --n_action_steps 16 \
+ --exe_steps 16 \
+ --n_iter 11 \
+ --timestep_spacing 'uniform_trailing' \
+ --guidance_rescale 0.7 \
+ --perframe_ae
+} 2>&1 | tee "${res_dir}/output.log"
diff --git a/unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox_v2/15.png b/unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox_v2/15.png
new file mode 100644
index 0000000..aab83f1
Binary files /dev/null and b/unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox_v2/15.png differ
diff --git a/unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/15.h5 b/unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/15.h5
new file mode 100644
index 0000000..0a6bb8f
Binary files /dev/null and b/unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/15.h5 differ
diff --git a/unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/meta_data/stats.safetensors b/unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/meta_data/stats.safetensors
new file mode 100644
index 0000000..6ef7a6c
Binary files /dev/null and b/unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/meta_data/stats.safetensors differ
diff --git a/unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox_v2.csv b/unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox_v2.csv
new file mode 100644
index 0000000..8cc81d4
--- /dev/null
+++ b/unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox_v2.csv
@@ -0,0 +1,2 @@
+videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps
+15,x,x,unitree_z1_dual_arm_stackbox_v2,"Stack the blocks in the rectangular block: red at the bottom, yellow in the middle, green on top",x,x,x,Unitree Z1 Robot Dual-Arm,30
diff --git a/unitree_z1_dual_arm_stackbox_v2/case3/run_world_model_interaction.sh b/unitree_z1_dual_arm_stackbox_v2/case3/run_world_model_interaction.sh
new file mode 100644
index 0000000..6708ee9
--- /dev/null
+++ b/unitree_z1_dual_arm_stackbox_v2/case3/run_world_model_interaction.sh
@@ -0,0 +1,24 @@
+res_dir="unitree_z1_dual_arm_stackbox_v2/case3"
+dataset="unitree_z1_dual_arm_stackbox_v2"
+
+{
+ time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \
+ --seed 123 \
+ --ckpt_path ckpts/unifolm_wma_dual.ckpt \
+ --config configs/inference/world_model_interaction.yaml \
+ --savedir "${res_dir}/output" \
+ --bs 1 --height 320 --width 512 \
+ --unconditional_guidance_scale 1.0 \
+ --ddim_steps 50 \
+ --ddim_eta 1.0 \
+ --prompt_dir "unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts" \
+ --dataset ${dataset} \
+ --video_length 16 \
+ --frame_stride 4 \
+ --n_action_steps 16 \
+ --exe_steps 16 \
+ --n_iter 11 \
+ --timestep_spacing 'uniform_trailing' \
+ --guidance_rescale 0.7 \
+ --perframe_ae
+} 2>&1 | tee "${res_dir}/output.log"
diff --git a/unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox_v2/25.png b/unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox_v2/25.png
new file mode 100644
index 0000000..f800036
Binary files /dev/null and b/unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox_v2/25.png differ
diff --git a/unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/25.h5 b/unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/25.h5
new file mode 100644
index 0000000..966e7cc
Binary files /dev/null and b/unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/25.h5 differ
diff --git a/unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/meta_data/stats.safetensors b/unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/meta_data/stats.safetensors
new file mode 100644
index 0000000..6ef7a6c
Binary files /dev/null and b/unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/meta_data/stats.safetensors differ
diff --git a/unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox_v2.csv b/unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox_v2.csv
new file mode 100644
index 0000000..4e1d4ee
--- /dev/null
+++ b/unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox_v2.csv
@@ -0,0 +1,2 @@
+videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps
+25,x,x,unitree_z1_dual_arm_stackbox_v2,"Stack the blocks in the rectangular block: red at the bottom, yellow in the middle, green on top",x,x,x,Unitree Z1 Robot Dual-Arm,30
diff --git a/unitree_z1_dual_arm_stackbox_v2/case4/run_world_model_interaction.sh b/unitree_z1_dual_arm_stackbox_v2/case4/run_world_model_interaction.sh
new file mode 100644
index 0000000..370c1c3
--- /dev/null
+++ b/unitree_z1_dual_arm_stackbox_v2/case4/run_world_model_interaction.sh
@@ -0,0 +1,24 @@
+res_dir="unitree_z1_dual_arm_stackbox_v2/case4"
+dataset="unitree_z1_dual_arm_stackbox_v2"
+
+{
+ time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \
+ --seed 123 \
+ --ckpt_path ckpts/unifolm_wma_dual.ckpt \
+ --config configs/inference/world_model_interaction.yaml \
+ --savedir "${res_dir}/output" \
+ --bs 1 --height 320 --width 512 \
+ --unconditional_guidance_scale 1.0 \
+ --ddim_steps 50 \
+ --ddim_eta 1.0 \
+ --prompt_dir "unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts" \
+ --dataset ${dataset} \
+ --video_length 16 \
+ --frame_stride 4 \
+ --n_action_steps 16 \
+ --exe_steps 16 \
+ --n_iter 11 \
+ --timestep_spacing 'uniform_trailing' \
+ --guidance_rescale 0.7 \
+ --perframe_ae
+} 2>&1 | tee "${res_dir}/output.log"
diff --git a/unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox_v2/35.png b/unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox_v2/35.png
new file mode 100644
index 0000000..d760f72
Binary files /dev/null and b/unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox_v2/35.png differ
diff --git a/unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/35.h5 b/unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/35.h5
new file mode 100644
index 0000000..d9adda8
Binary files /dev/null and b/unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/35.h5 differ
diff --git a/unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/meta_data/stats.safetensors b/unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/meta_data/stats.safetensors
new file mode 100644
index 0000000..6ef7a6c
Binary files /dev/null and b/unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/meta_data/stats.safetensors differ
diff --git a/unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox_v2.csv b/unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox_v2.csv
new file mode 100644
index 0000000..43c4b92
--- /dev/null
+++ b/unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox_v2.csv
@@ -0,0 +1,2 @@
+videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps
+35,x,x,unitree_z1_dual_arm_stackbox_v2,"Stack the blocks in the rectangular block: red at the bottom, yellow in the middle, green on top",x,x,x,Unitree Z1 Robot Dual-Arm,30
diff --git a/unitree_z1_stackbox/case1/run_world_model_interaction.sh b/unitree_z1_stackbox/case1/run_world_model_interaction.sh
new file mode 100644
index 0000000..73d9132
--- /dev/null
+++ b/unitree_z1_stackbox/case1/run_world_model_interaction.sh
@@ -0,0 +1,24 @@
+res_dir="unitree_z1_stackbox/case1"
+dataset="unitree_z1_stackbox"
+
+{
+ time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \
+ --seed 123 \
+ --ckpt_path ckpts/unifolm_wma_dual.ckpt \
+ --config configs/inference/world_model_interaction.yaml \
+ --savedir "${res_dir}/output" \
+ --bs 1 --height 320 --width 512 \
+ --unconditional_guidance_scale 1.0 \
+ --ddim_steps 50 \
+ --ddim_eta 1.0 \
+ --prompt_dir "unitree_z1_stackbox/case1/world_model_interaction_prompts" \
+ --dataset ${dataset} \
+ --video_length 16 \
+ --frame_stride 4 \
+ --n_action_steps 16 \
+ --exe_steps 16 \
+ --n_iter 12 \
+ --timestep_spacing 'uniform_trailing' \
+ --guidance_rescale 0.7 \
+ --perframe_ae
+} 2>&1 | tee "${res_dir}/output.log"
diff --git a/unitree_z1_stackbox/case1/world_model_interaction_prompts/images/unitree_z1_stackbox/5.png b/unitree_z1_stackbox/case1/world_model_interaction_prompts/images/unitree_z1_stackbox/5.png
new file mode 100644
index 0000000..8e265c0
Binary files /dev/null and b/unitree_z1_stackbox/case1/world_model_interaction_prompts/images/unitree_z1_stackbox/5.png differ
diff --git a/unitree_z1_stackbox/case1/world_model_interaction_prompts/transitions/unitree_z1_stackbox/5.h5 b/unitree_z1_stackbox/case1/world_model_interaction_prompts/transitions/unitree_z1_stackbox/5.h5
new file mode 100644
index 0000000..fa647f1
Binary files /dev/null and b/unitree_z1_stackbox/case1/world_model_interaction_prompts/transitions/unitree_z1_stackbox/5.h5 differ
diff --git a/unitree_z1_stackbox/case1/world_model_interaction_prompts/transitions/unitree_z1_stackbox/meta_data/stats.safetensors b/unitree_z1_stackbox/case1/world_model_interaction_prompts/transitions/unitree_z1_stackbox/meta_data/stats.safetensors
new file mode 100644
index 0000000..1918ea0
Binary files /dev/null and b/unitree_z1_stackbox/case1/world_model_interaction_prompts/transitions/unitree_z1_stackbox/meta_data/stats.safetensors differ
diff --git a/unitree_z1_stackbox/case1/world_model_interaction_prompts/unitree_z1_stackbox.csv b/unitree_z1_stackbox/case1/world_model_interaction_prompts/unitree_z1_stackbox.csv
new file mode 100644
index 0000000..8f55185
--- /dev/null
+++ b/unitree_z1_stackbox/case1/world_model_interaction_prompts/unitree_z1_stackbox.csv
@@ -0,0 +1,2 @@
+videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps
+5,x,x,unitree_z1_stackbox,"Pick up the red cup on the table.",x,x,x,Unitree Z1 Robot Arm,30
diff --git a/unitree_z1_stackbox/case2/run_world_model_interaction.sh b/unitree_z1_stackbox/case2/run_world_model_interaction.sh
new file mode 100644
index 0000000..95fb33b
--- /dev/null
+++ b/unitree_z1_stackbox/case2/run_world_model_interaction.sh
@@ -0,0 +1,24 @@
+res_dir="unitree_z1_stackbox/case2"
+dataset="unitree_z1_stackbox"
+
+{
+ time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \
+ --seed 123 \
+ --ckpt_path ckpts/unifolm_wma_dual.ckpt \
+ --config configs/inference/world_model_interaction.yaml \
+ --savedir "${res_dir}/output" \
+ --bs 1 --height 320 --width 512 \
+ --unconditional_guidance_scale 1.0 \
+ --ddim_steps 50 \
+ --ddim_eta 1.0 \
+ --prompt_dir "unitree_z1_stackbox/case2/world_model_interaction_prompts" \
+ --dataset ${dataset} \
+ --video_length 16 \
+ --frame_stride 4 \
+ --n_action_steps 16 \
+ --exe_steps 16 \
+ --n_iter 12 \
+ --timestep_spacing 'uniform_trailing' \
+ --guidance_rescale 0.7 \
+ --perframe_ae
+} 2>&1 | tee "${res_dir}/output.log"
diff --git a/unitree_z1_stackbox/case2/world_model_interaction_prompts/images/unitree_z1_stackbox/15.png b/unitree_z1_stackbox/case2/world_model_interaction_prompts/images/unitree_z1_stackbox/15.png
new file mode 100644
index 0000000..2b7be22
Binary files /dev/null and b/unitree_z1_stackbox/case2/world_model_interaction_prompts/images/unitree_z1_stackbox/15.png differ
diff --git a/unitree_z1_stackbox/case2/world_model_interaction_prompts/transitions/unitree_z1_stackbox/15.h5 b/unitree_z1_stackbox/case2/world_model_interaction_prompts/transitions/unitree_z1_stackbox/15.h5
new file mode 100644
index 0000000..4a71e9f
Binary files /dev/null and b/unitree_z1_stackbox/case2/world_model_interaction_prompts/transitions/unitree_z1_stackbox/15.h5 differ
diff --git a/unitree_z1_stackbox/case2/world_model_interaction_prompts/transitions/unitree_z1_stackbox/meta_data/stats.safetensors b/unitree_z1_stackbox/case2/world_model_interaction_prompts/transitions/unitree_z1_stackbox/meta_data/stats.safetensors
new file mode 100644
index 0000000..1918ea0
Binary files /dev/null and b/unitree_z1_stackbox/case2/world_model_interaction_prompts/transitions/unitree_z1_stackbox/meta_data/stats.safetensors differ
diff --git a/unitree_z1_stackbox/case2/world_model_interaction_prompts/unitree_z1_stackbox.csv b/unitree_z1_stackbox/case2/world_model_interaction_prompts/unitree_z1_stackbox.csv
new file mode 100644
index 0000000..bde4468
--- /dev/null
+++ b/unitree_z1_stackbox/case2/world_model_interaction_prompts/unitree_z1_stackbox.csv
@@ -0,0 +1,2 @@
+videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps
+15,x,x,unitree_z1_stackbox,"Pick up the red cup on the table.",x,x,x,Unitree Z1 Robot Arm,30
diff --git a/unitree_z1_stackbox/case3/run_world_model_interaction.sh b/unitree_z1_stackbox/case3/run_world_model_interaction.sh
new file mode 100644
index 0000000..d92501c
--- /dev/null
+++ b/unitree_z1_stackbox/case3/run_world_model_interaction.sh
@@ -0,0 +1,24 @@
+res_dir="unitree_z1_stackbox/case3"
+dataset="unitree_z1_stackbox"
+
+{
+ time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \
+ --seed 123 \
+ --ckpt_path ckpts/unifolm_wma_dual.ckpt \
+ --config configs/inference/world_model_interaction.yaml \
+ --savedir "${res_dir}/output" \
+ --bs 1 --height 320 --width 512 \
+ --unconditional_guidance_scale 1.0 \
+ --ddim_steps 50 \
+ --ddim_eta 1.0 \
+ --prompt_dir "unitree_z1_stackbox/case3/world_model_interaction_prompts" \
+ --dataset ${dataset} \
+ --video_length 16 \
+ --frame_stride 4 \
+ --n_action_steps 16 \
+ --exe_steps 16 \
+ --n_iter 12 \
+ --timestep_spacing 'uniform_trailing' \
+ --guidance_rescale 0.7 \
+ --perframe_ae
+} 2>&1 | tee "${res_dir}/output.log"
diff --git a/unitree_z1_stackbox/case3/world_model_interaction_prompts/images/unitree_z1_stackbox/25.png b/unitree_z1_stackbox/case3/world_model_interaction_prompts/images/unitree_z1_stackbox/25.png
new file mode 100644
index 0000000..1365fd5
Binary files /dev/null and b/unitree_z1_stackbox/case3/world_model_interaction_prompts/images/unitree_z1_stackbox/25.png differ
diff --git a/unitree_z1_stackbox/case3/world_model_interaction_prompts/transitions/unitree_z1_stackbox/25.h5 b/unitree_z1_stackbox/case3/world_model_interaction_prompts/transitions/unitree_z1_stackbox/25.h5
new file mode 100644
index 0000000..27c0773
Binary files /dev/null and b/unitree_z1_stackbox/case3/world_model_interaction_prompts/transitions/unitree_z1_stackbox/25.h5 differ
diff --git a/unitree_z1_stackbox/case3/world_model_interaction_prompts/transitions/unitree_z1_stackbox/meta_data/stats.safetensors b/unitree_z1_stackbox/case3/world_model_interaction_prompts/transitions/unitree_z1_stackbox/meta_data/stats.safetensors
new file mode 100644
index 0000000..1918ea0
Binary files /dev/null and b/unitree_z1_stackbox/case3/world_model_interaction_prompts/transitions/unitree_z1_stackbox/meta_data/stats.safetensors differ
diff --git a/unitree_z1_stackbox/case3/world_model_interaction_prompts/unitree_z1_stackbox.csv b/unitree_z1_stackbox/case3/world_model_interaction_prompts/unitree_z1_stackbox.csv
new file mode 100644
index 0000000..a32f631
--- /dev/null
+++ b/unitree_z1_stackbox/case3/world_model_interaction_prompts/unitree_z1_stackbox.csv
@@ -0,0 +1,2 @@
+videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps
+25,x,x,unitree_z1_stackbox,"Pick up the red cup on the table.",x,x,x,Unitree Z1 Robot Arm,30
diff --git a/unitree_z1_stackbox/case4/run_world_model_interaction.sh b/unitree_z1_stackbox/case4/run_world_model_interaction.sh
new file mode 100644
index 0000000..054b175
--- /dev/null
+++ b/unitree_z1_stackbox/case4/run_world_model_interaction.sh
@@ -0,0 +1,24 @@
+res_dir="unitree_z1_stackbox/case4"
+dataset="unitree_z1_stackbox"
+
+{
+ time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \
+ --seed 123 \
+ --ckpt_path ckpts/unifolm_wma_dual.ckpt \
+ --config configs/inference/world_model_interaction.yaml \
+ --savedir "${res_dir}/output" \
+ --bs 1 --height 320 --width 512 \
+ --unconditional_guidance_scale 1.0 \
+ --ddim_steps 50 \
+ --ddim_eta 1.0 \
+ --prompt_dir "unitree_z1_stackbox/case4/world_model_interaction_prompts" \
+ --dataset ${dataset} \
+ --video_length 16 \
+ --frame_stride 4 \
+ --n_action_steps 16 \
+ --exe_steps 16 \
+ --n_iter 12 \
+ --timestep_spacing 'uniform_trailing' \
+ --guidance_rescale 0.7 \
+ --perframe_ae
+} 2>&1 | tee "${res_dir}/output.log"
diff --git a/unitree_z1_stackbox/case4/world_model_interaction_prompts/images/unitree_z1_stackbox/35.png b/unitree_z1_stackbox/case4/world_model_interaction_prompts/images/unitree_z1_stackbox/35.png
new file mode 100644
index 0000000..67736af
Binary files /dev/null and b/unitree_z1_stackbox/case4/world_model_interaction_prompts/images/unitree_z1_stackbox/35.png differ
diff --git a/unitree_z1_stackbox/case4/world_model_interaction_prompts/transitions/unitree_z1_stackbox/35.h5 b/unitree_z1_stackbox/case4/world_model_interaction_prompts/transitions/unitree_z1_stackbox/35.h5
new file mode 100644
index 0000000..94322f7
Binary files /dev/null and b/unitree_z1_stackbox/case4/world_model_interaction_prompts/transitions/unitree_z1_stackbox/35.h5 differ
diff --git a/unitree_z1_stackbox/case4/world_model_interaction_prompts/transitions/unitree_z1_stackbox/meta_data/stats.safetensors b/unitree_z1_stackbox/case4/world_model_interaction_prompts/transitions/unitree_z1_stackbox/meta_data/stats.safetensors
new file mode 100644
index 0000000..1918ea0
Binary files /dev/null and b/unitree_z1_stackbox/case4/world_model_interaction_prompts/transitions/unitree_z1_stackbox/meta_data/stats.safetensors differ
diff --git a/unitree_z1_stackbox/case4/world_model_interaction_prompts/unitree_z1_stackbox.csv b/unitree_z1_stackbox/case4/world_model_interaction_prompts/unitree_z1_stackbox.csv
new file mode 100644
index 0000000..2f0bbc0
--- /dev/null
+++ b/unitree_z1_stackbox/case4/world_model_interaction_prompts/unitree_z1_stackbox.csv
@@ -0,0 +1,2 @@
+videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps
+35,x,x,unitree_z1_stackbox,"Pick up the red cup on the table.",x,x,x,Unitree Z1 Robot Arm,30