diff --git a/.gitignore b/.gitignore index 1735dae..2ae054d 100644 --- a/.gitignore +++ b/.gitignore @@ -128,3 +128,5 @@ Data/Pretrained Data/utils.py Experiment/checkpoint Experiment/log + +*.ckpt \ No newline at end of file diff --git a/ckpts/LICENSE b/ckpts/LICENSE new file mode 100644 index 0000000..5522eea --- /dev/null +++ b/ckpts/LICENSE @@ -0,0 +1,439 @@ +Attribution-NonCommercial-ShareAlike 4.0 International + +Copyright (c) 2016-2025 HangZhou YuShu TECHNOLOGY CO.,LTD. ("Unitree Robotics") + +======================================================================= + +Creative Commons Corporation ("Creative Commons") is not a law firm and +does not provide legal services or legal advice. Distribution of +Creative Commons public licenses does not create a lawyer-client or +other relationship. Creative Commons makes its licenses and related +information available on an "as-is" basis. Creative Commons gives no +warranties regarding its licenses, any material licensed under their +terms and conditions, or any related information. Creative Commons +disclaims all liability for damages resulting from their use to the +fullest extent possible. + +Using Creative Commons Public Licenses + +Creative Commons public licenses provide a standard set of terms and +conditions that creators and other rights holders may use to share +original works of authorship and other material subject to copyright +and certain other rights specified in the public license below. The +following considerations are for informational purposes only, are not +exhaustive, and do not form part of our licenses. + + Considerations for licensors: Our public licenses are + intended for use by those authorized to give the public + permission to use material in ways otherwise restricted by + copyright and certain other rights. Our licenses are + irrevocable. Licensors should read and understand the terms + and conditions of the license they choose before applying it. + Licensors should also secure all rights necessary before + applying our licenses so that the public can reuse the + material as expected. Licensors should clearly mark any + material not subject to the license. This includes other CC- + licensed material, or material used under an exception or + limitation to copyright. More considerations for licensors: + wiki.creativecommons.org/Considerations_for_licensors + + Considerations for the public: By using one of our public + licenses, a licensor grants the public permission to use the + licensed material under specified terms and conditions. If + the licensor's permission is not necessary for any reason--for + example, because of any applicable exception or limitation to + copyright--then that use is not regulated by the license. Our + licenses grant only permissions under copyright and certain + other rights that a licensor has authority to grant. Use of + the licensed material may still be restricted for other + reasons, including because others have copyright or other + rights in the material. A licensor may make special requests, + such as asking that all changes be marked or described. + Although not required by our licenses, you are encouraged to + respect those requests where reasonable. More considerations + for the public: + wiki.creativecommons.org/Considerations_for_licensees + +======================================================================= + +Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International +Public License + +By exercising the Licensed Rights (defined below), You accept and agree +to be bound by the terms and conditions of this Creative Commons +Attribution-NonCommercial-ShareAlike 4.0 International Public License +("Public License"). To the extent this Public License may be +interpreted as a contract, You are granted the Licensed Rights in +consideration of Your acceptance of these terms and conditions, and the +Licensor grants You such rights in consideration of benefits the +Licensor receives from making the Licensed Material available under +these terms and conditions. + + +Section 1 -- Definitions. + + a. Adapted Material means material subject to Copyright and Similar + Rights that is derived from or based upon the Licensed Material + and in which the Licensed Material is translated, altered, + arranged, transformed, or otherwise modified in a manner requiring + permission under the Copyright and Similar Rights held by the + Licensor. For purposes of this Public License, where the Licensed + Material is a musical work, performance, or sound recording, + Adapted Material is always produced where the Licensed Material is + synched in timed relation with a moving image. + + b. Adapter's License means the license You apply to Your Copyright + and Similar Rights in Your contributions to Adapted Material in + accordance with the terms and conditions of this Public License. + + c. BY-NC-SA Compatible License means a license listed at + creativecommons.org/compatiblelicenses, approved by Creative + Commons as essentially the equivalent of this Public License. + + d. Copyright and Similar Rights means copyright and/or similar rights + closely related to copyright including, without limitation, + performance, broadcast, sound recording, and Sui Generis Database + Rights, without regard to how the rights are labeled or + categorized. For purposes of this Public License, the rights + specified in Section 2(b)(1)-(2) are not Copyright and Similar + Rights. + + e. Effective Technological Measures means those measures that, in the + absence of proper authority, may not be circumvented under laws + fulfilling obligations under Article 11 of the WIPO Copyright + Treaty adopted on December 20, 1996, and/or similar international + agreements. + + f. Exceptions and Limitations means fair use, fair dealing, and/or + any other exception or limitation to Copyright and Similar Rights + that applies to Your use of the Licensed Material. + + g. License Elements means the license attributes listed in the name + of a Creative Commons Public License. The License Elements of this + Public License are Attribution, NonCommercial, and ShareAlike. + + h. Licensed Material means the artistic or literary work, database, + or other material to which the Licensor applied this Public + License. + + i. Licensed Rights means the rights granted to You subject to the + terms and conditions of this Public License, which are limited to + all Copyright and Similar Rights that apply to Your use of the + Licensed Material and that the Licensor has authority to license. + + j. Licensor means the individual(s) or entity(ies) granting rights + under this Public License. + + k. NonCommercial means not primarily intended for or directed towards + commercial advantage or monetary compensation. For purposes of + this Public License, the exchange of the Licensed Material for + other material subject to Copyright and Similar Rights by digital + file-sharing or similar means is NonCommercial provided there is + no payment of monetary compensation in connection with the + exchange. + + l. Share means to provide material to the public by any means or + process that requires permission under the Licensed Rights, such + as reproduction, public display, public performance, distribution, + dissemination, communication, or importation, and to make material + available to the public including in ways that members of the + public may access the material from a place and at a time + individually chosen by them. + + m. Sui Generis Database Rights means rights other than copyright + resulting from Directive 96/9/EC of the European Parliament and of + the Council of 11 March 1996 on the legal protection of databases, + as amended and/or succeeded, as well as other essentially + equivalent rights anywhere in the world. + + n. You means the individual or entity exercising the Licensed Rights + under this Public License. Your has a corresponding meaning. + + +Section 2 -- Scope. + + a. License grant. + + 1. Subject to the terms and conditions of this Public License, + the Licensor hereby grants You a worldwide, royalty-free, + non-sublicensable, non-exclusive, irrevocable license to + exercise the Licensed Rights in the Licensed Material to: + + a. reproduce and Share the Licensed Material, in whole or + in part, for NonCommercial purposes only; and + + b. produce, reproduce, and Share Adapted Material for + NonCommercial purposes only. + + 2. Exceptions and Limitations. For the avoidance of doubt, where + Exceptions and Limitations apply to Your use, this Public + License does not apply, and You do not need to comply with + its terms and conditions. + + 3. Term. The term of this Public License is specified in Section + 6(a). + + 4. Media and formats; technical modifications allowed. The + Licensor authorizes You to exercise the Licensed Rights in + all media and formats whether now known or hereafter created, + and to make technical modifications necessary to do so. The + Licensor waives and/or agrees not to assert any right or + authority to forbid You from making technical modifications + necessary to exercise the Licensed Rights, including + technical modifications necessary to circumvent Effective + Technological Measures. For purposes of this Public License, + simply making modifications authorized by this Section 2(a) + (4) never produces Adapted Material. + + 5. Downstream recipients. + + a. Offer from the Licensor -- Licensed Material. Every + recipient of the Licensed Material automatically + receives an offer from the Licensor to exercise the + Licensed Rights under the terms and conditions of this + Public License. + + b. Additional offer from the Licensor -- Adapted Material. + Every recipient of Adapted Material from You + automatically receives an offer from the Licensor to + exercise the Licensed Rights in the Adapted Material + under the conditions of the Adapter's License You apply. + + c. No downstream restrictions. You may not offer or impose + any additional or different terms or conditions on, or + apply any Effective Technological Measures to, the + Licensed Material if doing so restricts exercise of the + Licensed Rights by any recipient of the Licensed + Material. + + 6. No endorsement. Nothing in this Public License constitutes or + may be construed as permission to assert or imply that You + are, or that Your use of the Licensed Material is, connected + with, or sponsored, endorsed, or granted official status by, + the Licensor or others designated to receive attribution as + provided in Section 3(a)(1)(A)(i). + + b. Other rights. + + 1. Moral rights, such as the right of integrity, are not + licensed under this Public License, nor are publicity, + privacy, and/or other similar personality rights; however, to + the extent possible, the Licensor waives and/or agrees not to + assert any such rights held by the Licensor to the limited + extent necessary to allow You to exercise the Licensed + Rights, but not otherwise. + + 2. Patent and trademark rights are not licensed under this + Public License. + + 3. To the extent possible, the Licensor waives any right to + collect royalties from You for the exercise of the Licensed + Rights, whether directly or through a collecting society + under any voluntary or waivable statutory or compulsory + licensing scheme. In all other cases the Licensor expressly + reserves any right to collect such royalties, including when + the Licensed Material is used other than for NonCommercial + purposes. + + +Section 3 -- License Conditions. + +Your exercise of the Licensed Rights is expressly made subject to the +following conditions. + + a. Attribution. + + 1. If You Share the Licensed Material (including in modified + form), You must: + + a. retain the following if it is supplied by the Licensor + with the Licensed Material: + + i. identification of the creator(s) of the Licensed + Material and any others designated to receive + attribution, in any reasonable manner requested by + the Licensor (including by pseudonym if + designated); + + ii. a copyright notice; + + iii. a notice that refers to this Public License; + + iv. a notice that refers to the disclaimer of + warranties; + + v. a URI or hyperlink to the Licensed Material to the + extent reasonably practicable; + + b. indicate if You modified the Licensed Material and + retain an indication of any previous modifications; and + + c. indicate the Licensed Material is licensed under this + Public License, and include the text of, or the URI or + hyperlink to, this Public License. + + 2. You may satisfy the conditions in Section 3(a)(1) in any + reasonable manner based on the medium, means, and context in + which You Share the Licensed Material. For example, it may be + reasonable to satisfy the conditions by providing a URI or + hyperlink to a resource that includes the required + information. + 3. If requested by the Licensor, You must remove any of the + information required by Section 3(a)(1)(A) to the extent + reasonably practicable. + + b. ShareAlike. + + In addition to the conditions in Section 3(a), if You Share + Adapted Material You produce, the following conditions also apply. + + 1. The Adapter's License You apply must be a Creative Commons + license with the same License Elements, this version or + later, or a BY-NC-SA Compatible License. + + 2. You must include the text of, or the URI or hyperlink to, the + Adapter's License You apply. You may satisfy this condition + in any reasonable manner based on the medium, means, and + context in which You Share Adapted Material. + + 3. You may not offer or impose any additional or different terms + or conditions on, or apply any Effective Technological + Measures to, Adapted Material that restrict exercise of the + rights granted under the Adapter's License You apply. + + +Section 4 -- Sui Generis Database Rights. + +Where the Licensed Rights include Sui Generis Database Rights that +apply to Your use of the Licensed Material: + + a. for the avoidance of doubt, Section 2(a)(1) grants You the right + to extract, reuse, reproduce, and Share all or a substantial + portion of the contents of the database for NonCommercial purposes + only; + + b. if You include all or a substantial portion of the database + contents in a database in which You have Sui Generis Database + Rights, then the database in which You have Sui Generis Database + Rights (but not its individual contents) is Adapted Material, + including for purposes of Section 3(b); and + + c. You must comply with the conditions in Section 3(a) if You Share + all or a substantial portion of the contents of the database. + +For the avoidance of doubt, this Section 4 supplements and does not +replace Your obligations under this Public License where the Licensed +Rights include other Copyright and Similar Rights. + + +Section 5 -- Disclaimer of Warranties and Limitation of Liability. + + a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE + EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS + AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF + ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, + IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, + WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR + PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, + ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT + KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT + ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU. + + b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE + TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, + NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, + INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, + COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR + USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN + ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR + DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR + IN PART, THIS LIMITATION MAY NOT APPLY TO YOU. + + c. The disclaimer of warranties and limitation of liability provided + above shall be interpreted in a manner that, to the extent + possible, most closely approximates an absolute disclaimer and + waiver of all liability. + + +Section 6 -- Term and Termination. + + a. This Public License applies for the term of the Copyright and + Similar Rights licensed here. However, if You fail to comply with + this Public License, then Your rights under this Public License + terminate automatically. + + b. Where Your right to use the Licensed Material has terminated under + Section 6(a), it reinstates: + + 1. automatically as of the date the violation is cured, provided + it is cured within 30 days of Your discovery of the + violation; or + + 2. upon express reinstatement by the Licensor. + + For the avoidance of doubt, this Section 6(b) does not affect any + right the Licensor may have to seek remedies for Your violations + of this Public License. + + c. For the avoidance of doubt, the Licensor may also offer the + Licensed Material under separate terms or conditions or stop + distributing the Licensed Material at any time; however, doing so + will not terminate this Public License. + + d. Sections 1, 5, 6, 7, and 8 survive termination of this Public + License. + + +Section 7 -- Other Terms and Conditions. + + a. The Licensor shall not be bound by any additional or different + terms or conditions communicated by You unless expressly agreed. + + b. Any arrangements, understandings, or agreements regarding the + Licensed Material not stated herein are separate from and + independent of the terms and conditions of this Public License. + + +Section 8 -- Interpretation. + + a. For the avoidance of doubt, this Public License does not, and + shall not be interpreted to, reduce, limit, restrict, or impose + conditions on any use of the Licensed Material that could lawfully + be made without permission under this Public License. + + b. To the extent possible, if any provision of this Public License is + deemed unenforceable, it shall be automatically reformed to the + minimum extent necessary to make it enforceable. If the provision + cannot be reformed, it shall be severed from this Public License + without affecting the enforceability of the remaining terms and + conditions. + + c. No term or condition of this Public License will be waived and no + failure to comply consented to unless expressly agreed to by the + Licensor. + + d. Nothing in this Public License constitutes or may be interpreted + as a limitation upon, or waiver of, any privileges and immunities + that apply to the Licensor or You, including from the legal + processes of any jurisdiction or authority. + +======================================================================= + +Creative Commons is not a party to its public +licenses. Notwithstanding, Creative Commons may elect to apply one of +its public licenses to material it publishes and in those instances +will be considered the “Licensor.” The text of the Creative Commons +public licenses is dedicated to the public domain under the CC0 Public +Domain Dedication. Except for the limited purpose of indicating that +material is shared under a Creative Commons public license or as +otherwise permitted by the Creative Commons policies published at +creativecommons.org/policies, Creative Commons does not authorize the +use of the trademark "Creative Commons" or any other trademark or logo +of Creative Commons without its prior written consent including, +without limitation, in connection with any unauthorized modifications +to any of its public licenses or any other arrangements, +understandings, or agreements concerning use of licensed material. For +the avoidance of doubt, this paragraph does not form part of the +public licenses. + +Creative Commons may be contacted at creativecommons.org. diff --git a/ckpts/README.md b/ckpts/README.md new file mode 100644 index 0000000..038ff9b --- /dev/null +++ b/ckpts/README.md @@ -0,0 +1,38 @@ +--- +tags: +- robotics +--- + +# UnifoLM-WMA-0: A World-Model-Action (WMA) Framework under UnifoLM Family +

+ Project Page | + Code | + Dataset +

+
+
+ UnifoLM-WMA-0 is Unitree‘s first open-source world-model–action architecture spanning multiple types of robotic embodiments, designed specifically for general-purpose robot learning. Its core component is a world-model capable of understanding the physical interactions between robots and the environments. This world-model provides two key functions: (a) Simulation Engine – operates as an interactive simulator to generate synthetic data for robot learning; (b) Policy Enhancement – connects with an action head and, by predicting future interaction processes with the world-model, further optimizes decision-making performance. +
+
+ +## 🦾 Real Robot Deployment +| | | +|:---:|:---:| +| | | + +**Note: the top-right window shows the world model’s prediction of future environmental changes.** + +## License +The model is released under the CC BY-NC-SA 4.0 license as found in the [LICENSE](https://huggingface.co/unitreerobotics/UnifoLM-WMA-0/blob/main/LICENSE). You are responsible for ensuring that your use of Unitree AI Models complies with all applicable laws. + +## Model Architecture +![Demo](assets/world_model_interaction.gif) + +## Citation +``` +@misc{unifolm-wma-0, + author = {Unitree}, + title = {UnifoLM-WMA-0: A World-Model-Action (WMA) Framework under UnifoLM Family}, + year = {2025}, +} +``` \ No newline at end of file diff --git a/ckpts/assets/real_cleanup_pencils.gif b/ckpts/assets/real_cleanup_pencils.gif new file mode 100644 index 0000000..2d2cd04 Binary files /dev/null and b/ckpts/assets/real_cleanup_pencils.gif differ diff --git a/ckpts/assets/real_dual_stackbox.gif b/ckpts/assets/real_dual_stackbox.gif new file mode 100644 index 0000000..e0a884c Binary files /dev/null and b/ckpts/assets/real_dual_stackbox.gif differ diff --git a/ckpts/assets/real_g1_pack_camera.gif b/ckpts/assets/real_g1_pack_camera.gif new file mode 100644 index 0000000..90bbf26 Binary files /dev/null and b/ckpts/assets/real_g1_pack_camera.gif differ diff --git a/ckpts/assets/real_z1_stackbox.gif b/ckpts/assets/real_z1_stackbox.gif new file mode 100644 index 0000000..d33a49d Binary files /dev/null and b/ckpts/assets/real_z1_stackbox.gif differ diff --git a/ckpts/assets/world_model_interaction.gif b/ckpts/assets/world_model_interaction.gif new file mode 100644 index 0000000..0ec8534 Binary files /dev/null and b/ckpts/assets/world_model_interaction.gif differ diff --git a/configs/inference/world_model_interaction.yaml b/configs/inference/world_model_interaction.yaml index 970d029..da709e0 100644 --- a/configs/inference/world_model_interaction.yaml +++ b/configs/inference/world_model_interaction.yaml @@ -222,7 +222,7 @@ data: test: target: unifolm_wma.data.wma_data.WMAData params: - data_dir: '/path/to/unifolm-world-model-action/examples/world_model_interaction_prompts' + data_dir: '/mnt/ASC1637/unifolm-world-model-action/examples/world_model_interaction_prompts' video_length: ${model.params.wma_config.params.temporal_length} frame_stride: 2 load_raw_resolution: True diff --git a/psnr_score_for_challenge.py b/psnr_score_for_challenge.py new file mode 100644 index 0000000..6223db6 --- /dev/null +++ b/psnr_score_for_challenge.py @@ -0,0 +1,89 @@ +import os +import glob +import numpy as np +import json +from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter +from tqdm import tqdm +from moviepy.video.io.VideoFileClip import VideoFileClip +import PIL.Image + + +def calculate_psnr(img1, img2): + mse = np.mean((img1.astype(np.float64) - img2.astype(np.float64)) ** 2) + if mse == 0: + return float('inf') + max_pixel = 255.0 + psnr = 20 * np.log10(max_pixel / np.sqrt(mse)) + return psnr + + +def process_video_psnr(gt_path, pred_path): + try: + clip_gt = VideoFileClip(gt_path) + clip_pred = VideoFileClip(pred_path) + + fps = min(clip_gt.fps, clip_pred.fps) + duration = min(clip_gt.duration, clip_pred.duration) + + time_points = np.arange(0, duration, 1.0 / fps) + + video_psnrs = [] + + for t in time_points: + frame_gt = clip_gt.get_frame(t) + frame_pred = clip_pred.get_frame(t) + + img_gt = PIL.Image.fromarray(frame_gt).resize((256, 256), PIL.Image.Resampling.BILINEAR) + img_pred = PIL.Image.fromarray(frame_pred).resize((256, 256), PIL.Image.Resampling.BILINEAR) + + psnr = calculate_psnr(np.array(img_gt), np.array(img_pred)) + video_psnrs.append(psnr) + + clip_gt.close() + clip_pred.close() + + return np.mean(video_psnrs) if video_psnrs else 0.0 + + except Exception as e: + print(f"Error processing {os.path.basename(gt_path)}: {e}") + return None + + +def main(): + parser = ArgumentParser(formatter_class=ArgumentDefaultsHelpFormatter) + parser.add_argument('--gt_video', type=str, required=True, help='path to reference videos') + parser.add_argument('--pred_video', type=str, required=True, help='path to pred videos') + parser.add_argument('--output_file', type=str, default=None, help='path to output file') + args = parser.parse_args() + + if not os.path.exists(args.gt_video): + print(f"Error: GT video not found at {args.gt_video}") + return + if not os.path.exists(args.pred_video): + print(f"Error: Pred video not found at {args.pred_video}") + return + + print(f"Comparing:\nRef: {args.gt_video}\nPred: {args.pred_video}") + + v_psnr = process_video_psnr(args.gt_video, args.pred_video) + + if v_psnr is not None: + print("-" * 30) + print(f"Video PSNR: {v_psnr:.4f} dB") + print("-" * 30) + + if args.output_file: + result = { + "gt_video": args.gt_video, + "pred_video": args.pred_video, + "psnr": v_psnr + } + with open(args.output_file, 'w') as f: + json.dump(result, f, indent=4) + print(f"Result saved to {args.output_file}") + else: + print("Failed to calculate PSNR.") + + +if __name__ == '__main__': + main() diff --git a/pyproject.toml b/pyproject.toml index e08d9e6..af6c3df 100755 --- a/pyproject.toml +++ b/pyproject.toml @@ -19,13 +19,13 @@ dependencies = [ "pytorch-lightning==1.9.3", "pyyaml==6.0", "setuptools==65.6.3", - "torch==2.3.1", - "torchvision==0.18.1", + #"torch==2.3.1", + #"torchvision==0.18.1", "tqdm==4.66.5", "transformers==4.40.1", "moviepy==1.0.3", "av==12.3.0", - "xformers==0.0.27", + #"xformers==0.0.27", "gradio==4.39.0", "timm==0.9.10", "scikit-learn==1.5.1", diff --git a/unitree_g1_pack_camera/case1/run_world_model_interaction.sh b/unitree_g1_pack_camera/case1/run_world_model_interaction.sh new file mode 100644 index 0000000..e0e900f --- /dev/null +++ b/unitree_g1_pack_camera/case1/run_world_model_interaction.sh @@ -0,0 +1,24 @@ +res_dir="unitree_g1_pack_camera/case1" +dataset="unitree_g1_pack_camera" + +{ + time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \ + --seed 123 \ + --ckpt_path ckpts/unifolm_wma_dual.ckpt \ + --config configs/inference/world_model_interaction.yaml \ + --savedir "${res_dir}/output" \ + --bs 1 --height 320 --width 512 \ + --unconditional_guidance_scale 1.0 \ + --ddim_steps 50 \ + --ddim_eta 1.0 \ + --prompt_dir "unitree_g1_pack_camera/case1/world_model_interaction_prompts" \ + --dataset ${dataset} \ + --video_length 16 \ + --frame_stride 6 \ + --n_action_steps 16 \ + --exe_steps 16 \ + --n_iter 11 \ + --timestep_spacing 'uniform_trailing' \ + --guidance_rescale 0.7 \ + --perframe_ae +} 2>&1 | tee "${res_dir}/output.log" diff --git a/unitree_g1_pack_camera/case1/world_model_interaction_prompts/images/unitree_g1_pack_camera/0.png b/unitree_g1_pack_camera/case1/world_model_interaction_prompts/images/unitree_g1_pack_camera/0.png new file mode 100644 index 0000000..8008d7a Binary files /dev/null and b/unitree_g1_pack_camera/case1/world_model_interaction_prompts/images/unitree_g1_pack_camera/0.png differ diff --git a/unitree_g1_pack_camera/case1/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/0.h5 b/unitree_g1_pack_camera/case1/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/0.h5 new file mode 100644 index 0000000..a5bf1f7 Binary files /dev/null and b/unitree_g1_pack_camera/case1/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/0.h5 differ diff --git a/unitree_g1_pack_camera/case1/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/meta_data/stats.safetensors b/unitree_g1_pack_camera/case1/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/meta_data/stats.safetensors new file mode 100644 index 0000000..4bdf81f Binary files /dev/null and b/unitree_g1_pack_camera/case1/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/meta_data/stats.safetensors differ diff --git a/unitree_g1_pack_camera/case1/world_model_interaction_prompts/unitree_g1_pack_camera.csv b/unitree_g1_pack_camera/case1/world_model_interaction_prompts/unitree_g1_pack_camera.csv new file mode 100644 index 0000000..2bdc1cd --- /dev/null +++ b/unitree_g1_pack_camera/case1/world_model_interaction_prompts/unitree_g1_pack_camera.csv @@ -0,0 +1,2 @@ +videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps +0,x,x,unitree_g1_pack_camera,mount camera,x,x,x,G1_Dex1,30 diff --git a/unitree_g1_pack_camera/case2/run_world_model_interaction.sh b/unitree_g1_pack_camera/case2/run_world_model_interaction.sh new file mode 100644 index 0000000..36e613d --- /dev/null +++ b/unitree_g1_pack_camera/case2/run_world_model_interaction.sh @@ -0,0 +1,24 @@ +res_dir="unitree_g1_pack_camera/case2" +dataset="unitree_g1_pack_camera" + +{ + time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \ + --seed 123 \ + --ckpt_path ckpts/unifolm_wma_dual.ckpt \ + --config configs/inference/world_model_interaction.yaml \ + --savedir "${res_dir}/output" \ + --bs 1 --height 320 --width 512 \ + --unconditional_guidance_scale 1.0 \ + --ddim_steps 50 \ + --ddim_eta 1.0 \ + --prompt_dir "unitree_g1_pack_camera/case2/world_model_interaction_prompts" \ + --dataset ${dataset} \ + --video_length 16 \ + --frame_stride 6 \ + --n_action_steps 16 \ + --exe_steps 16 \ + --n_iter 11 \ + --timestep_spacing 'uniform_trailing' \ + --guidance_rescale 0.7 \ + --perframe_ae +} 2>&1 | tee "${res_dir}/output.log" diff --git a/unitree_g1_pack_camera/case2/world_model_interaction_prompts/images/unitree_g1_pack_camera/50.png b/unitree_g1_pack_camera/case2/world_model_interaction_prompts/images/unitree_g1_pack_camera/50.png new file mode 100644 index 0000000..83eebaf Binary files /dev/null and b/unitree_g1_pack_camera/case2/world_model_interaction_prompts/images/unitree_g1_pack_camera/50.png differ diff --git a/unitree_g1_pack_camera/case2/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/50.h5 b/unitree_g1_pack_camera/case2/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/50.h5 new file mode 100644 index 0000000..90e741b Binary files /dev/null and b/unitree_g1_pack_camera/case2/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/50.h5 differ diff --git a/unitree_g1_pack_camera/case2/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/meta_data/stats.safetensors b/unitree_g1_pack_camera/case2/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/meta_data/stats.safetensors new file mode 100644 index 0000000..4bdf81f Binary files /dev/null and b/unitree_g1_pack_camera/case2/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/meta_data/stats.safetensors differ diff --git a/unitree_g1_pack_camera/case2/world_model_interaction_prompts/unitree_g1_pack_camera.csv b/unitree_g1_pack_camera/case2/world_model_interaction_prompts/unitree_g1_pack_camera.csv new file mode 100644 index 0000000..35ead3a --- /dev/null +++ b/unitree_g1_pack_camera/case2/world_model_interaction_prompts/unitree_g1_pack_camera.csv @@ -0,0 +1,2 @@ +videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps +50,x,x,unitree_g1_pack_camera,mount camera,x,x,x,G1_Dex1,30 diff --git a/unitree_g1_pack_camera/case3/run_world_model_interaction.sh b/unitree_g1_pack_camera/case3/run_world_model_interaction.sh new file mode 100644 index 0000000..87e3098 --- /dev/null +++ b/unitree_g1_pack_camera/case3/run_world_model_interaction.sh @@ -0,0 +1,24 @@ +res_dir="unitree_g1_pack_camera/case3" +dataset="unitree_g1_pack_camera" + +{ + time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \ + --seed 123 \ + --ckpt_path ckpts/unifolm_wma_dual.ckpt \ + --config configs/inference/world_model_interaction.yaml \ + --savedir "${res_dir}/output" \ + --bs 1 --height 320 --width 512 \ + --unconditional_guidance_scale 1.0 \ + --ddim_steps 50 \ + --ddim_eta 1.0 \ + --prompt_dir "unitree_g1_pack_camera/case3/world_model_interaction_prompts" \ + --dataset ${dataset} \ + --video_length 16 \ + --frame_stride 6 \ + --n_action_steps 16 \ + --exe_steps 16 \ + --n_iter 11 \ + --timestep_spacing 'uniform_trailing' \ + --guidance_rescale 0.7 \ + --perframe_ae +} 2>&1 | tee "${res_dir}/output.log" diff --git a/unitree_g1_pack_camera/case3/world_model_interaction_prompts/images/unitree_g1_pack_camera/100.png b/unitree_g1_pack_camera/case3/world_model_interaction_prompts/images/unitree_g1_pack_camera/100.png new file mode 100644 index 0000000..2f658f3 Binary files /dev/null and b/unitree_g1_pack_camera/case3/world_model_interaction_prompts/images/unitree_g1_pack_camera/100.png differ diff --git a/unitree_g1_pack_camera/case3/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/100.h5 b/unitree_g1_pack_camera/case3/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/100.h5 new file mode 100644 index 0000000..f976464 Binary files /dev/null and b/unitree_g1_pack_camera/case3/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/100.h5 differ diff --git a/unitree_g1_pack_camera/case3/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/meta_data/stats.safetensors b/unitree_g1_pack_camera/case3/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/meta_data/stats.safetensors new file mode 100644 index 0000000..4bdf81f Binary files /dev/null and b/unitree_g1_pack_camera/case3/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/meta_data/stats.safetensors differ diff --git a/unitree_g1_pack_camera/case3/world_model_interaction_prompts/unitree_g1_pack_camera.csv b/unitree_g1_pack_camera/case3/world_model_interaction_prompts/unitree_g1_pack_camera.csv new file mode 100644 index 0000000..c6350c9 --- /dev/null +++ b/unitree_g1_pack_camera/case3/world_model_interaction_prompts/unitree_g1_pack_camera.csv @@ -0,0 +1,2 @@ +videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps +100,x,x,unitree_g1_pack_camera,mount camera,x,x,x,G1_Dex1,30 diff --git a/unitree_g1_pack_camera/case4/run_world_model_interaction.sh b/unitree_g1_pack_camera/case4/run_world_model_interaction.sh new file mode 100644 index 0000000..46c5217 --- /dev/null +++ b/unitree_g1_pack_camera/case4/run_world_model_interaction.sh @@ -0,0 +1,24 @@ +res_dir="unitree_g1_pack_camera/case4" +dataset="unitree_g1_pack_camera" + +{ + time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \ + --seed 123 \ + --ckpt_path ckpts/unifolm_wma_dual.ckpt \ + --config configs/inference/world_model_interaction.yaml \ + --savedir "${res_dir}/output" \ + --bs 1 --height 320 --width 512 \ + --unconditional_guidance_scale 1.0 \ + --ddim_steps 50 \ + --ddim_eta 1.0 \ + --prompt_dir "unitree_g1_pack_camera/case4/world_model_interaction_prompts" \ + --dataset ${dataset} \ + --video_length 16 \ + --frame_stride 6 \ + --n_action_steps 16 \ + --exe_steps 16 \ + --n_iter 11 \ + --timestep_spacing 'uniform_trailing' \ + --guidance_rescale 0.7 \ + --perframe_ae +} 2>&1 | tee "${res_dir}/output.log" diff --git a/unitree_g1_pack_camera/case4/world_model_interaction_prompts/images/unitree_g1_pack_camera/200.png b/unitree_g1_pack_camera/case4/world_model_interaction_prompts/images/unitree_g1_pack_camera/200.png new file mode 100644 index 0000000..3c718aa Binary files /dev/null and b/unitree_g1_pack_camera/case4/world_model_interaction_prompts/images/unitree_g1_pack_camera/200.png differ diff --git a/unitree_g1_pack_camera/case4/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/200.h5 b/unitree_g1_pack_camera/case4/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/200.h5 new file mode 100644 index 0000000..606c218 Binary files /dev/null and b/unitree_g1_pack_camera/case4/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/200.h5 differ diff --git a/unitree_g1_pack_camera/case4/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/meta_data/stats.safetensors b/unitree_g1_pack_camera/case4/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/meta_data/stats.safetensors new file mode 100644 index 0000000..4bdf81f Binary files /dev/null and b/unitree_g1_pack_camera/case4/world_model_interaction_prompts/transitions/unitree_g1_pack_camera/meta_data/stats.safetensors differ diff --git a/unitree_g1_pack_camera/case4/world_model_interaction_prompts/unitree_g1_pack_camera.csv b/unitree_g1_pack_camera/case4/world_model_interaction_prompts/unitree_g1_pack_camera.csv new file mode 100644 index 0000000..1fae9f0 --- /dev/null +++ b/unitree_g1_pack_camera/case4/world_model_interaction_prompts/unitree_g1_pack_camera.csv @@ -0,0 +1,2 @@ +videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps +200,x,x,unitree_g1_pack_camera,mount camera,x,x,x,G1_Dex1,30 diff --git a/unitree_z1_dual_arm_cleanup_pencils/case1/run_world_model_interaction.sh b/unitree_z1_dual_arm_cleanup_pencils/case1/run_world_model_interaction.sh new file mode 100644 index 0000000..8fe141f --- /dev/null +++ b/unitree_z1_dual_arm_cleanup_pencils/case1/run_world_model_interaction.sh @@ -0,0 +1,24 @@ +res_dir="unitree_z1_dual_arm_cleanup_pencils/case1" +dataset="unitree_z1_dual_arm_cleanup_pencils" + +{ + time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \ + --seed 123 \ + --ckpt_path ckpts/unifolm_wma_dual.ckpt \ + --config configs/inference/world_model_interaction.yaml \ + --savedir "${res_dir}/output" \ + --bs 1 --height 320 --width 512 \ + --unconditional_guidance_scale 1.0 \ + --ddim_steps 50 \ + --ddim_eta 1.0 \ + --prompt_dir "unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts" \ + --dataset ${dataset} \ + --video_length 16 \ + --frame_stride 4 \ + --n_action_steps 16 \ + --exe_steps 16 \ + --n_iter 8 \ + --timestep_spacing 'uniform_trailing' \ + --guidance_rescale 0.7 \ + --perframe_ae +} 2>&1 | tee "${res_dir}/output.log" diff --git a/unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts/images/unitree_z1_dual_arm_cleanup_pencils/0.png b/unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts/images/unitree_z1_dual_arm_cleanup_pencils/0.png new file mode 100644 index 0000000..2d8739d Binary files /dev/null and b/unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts/images/unitree_z1_dual_arm_cleanup_pencils/0.png differ diff --git a/unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/0.h5 b/unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/0.h5 new file mode 100644 index 0000000..6b120eb Binary files /dev/null and b/unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/0.h5 differ diff --git a/unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/meta_data/stats.safetensors b/unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/meta_data/stats.safetensors new file mode 100644 index 0000000..e3194ab Binary files /dev/null and b/unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/meta_data/stats.safetensors differ diff --git a/unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts/unitree_z1_dual_arm_cleanup_pencils.csv b/unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts/unitree_z1_dual_arm_cleanup_pencils.csv new file mode 100644 index 0000000..a749385 --- /dev/null +++ b/unitree_z1_dual_arm_cleanup_pencils/case1/world_model_interaction_prompts/unitree_z1_dual_arm_cleanup_pencils.csv @@ -0,0 +1,2 @@ +videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps +0,x,x,unitree_z1_dual_arm_cleanup_pencils,clean up eraser and pencils,x,x,x,Z1_Dual_Dex1,30 diff --git a/unitree_z1_dual_arm_cleanup_pencils/case2/run_world_model_interaction.sh b/unitree_z1_dual_arm_cleanup_pencils/case2/run_world_model_interaction.sh new file mode 100644 index 0000000..2b84103 --- /dev/null +++ b/unitree_z1_dual_arm_cleanup_pencils/case2/run_world_model_interaction.sh @@ -0,0 +1,24 @@ +res_dir="unitree_z1_dual_arm_cleanup_pencils/case2" +dataset="unitree_z1_dual_arm_cleanup_pencils" + +{ + time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \ + --seed 123 \ + --ckpt_path ckpts/unifolm_wma_dual.ckpt \ + --config configs/inference/world_model_interaction.yaml \ + --savedir "${res_dir}/output" \ + --bs 1 --height 320 --width 512 \ + --unconditional_guidance_scale 1.0 \ + --ddim_steps 50 \ + --ddim_eta 1.0 \ + --prompt_dir "unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts" \ + --dataset ${dataset} \ + --video_length 16 \ + --frame_stride 4 \ + --n_action_steps 16 \ + --exe_steps 16 \ + --n_iter 8 \ + --timestep_spacing 'uniform_trailing' \ + --guidance_rescale 0.7 \ + --perframe_ae +} 2>&1 | tee "${res_dir}/output.log" diff --git a/unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts/images/unitree_z1_dual_arm_cleanup_pencils/50.png b/unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts/images/unitree_z1_dual_arm_cleanup_pencils/50.png new file mode 100644 index 0000000..91725eb Binary files /dev/null and b/unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts/images/unitree_z1_dual_arm_cleanup_pencils/50.png differ diff --git a/unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/50.h5 b/unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/50.h5 new file mode 100644 index 0000000..6c08657 Binary files /dev/null and b/unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/50.h5 differ diff --git a/unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/meta_data/stats.safetensors b/unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/meta_data/stats.safetensors new file mode 100644 index 0000000..e3194ab Binary files /dev/null and b/unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/meta_data/stats.safetensors differ diff --git a/unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts/unitree_z1_dual_arm_cleanup_pencils.csv b/unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts/unitree_z1_dual_arm_cleanup_pencils.csv new file mode 100644 index 0000000..a754862 --- /dev/null +++ b/unitree_z1_dual_arm_cleanup_pencils/case2/world_model_interaction_prompts/unitree_z1_dual_arm_cleanup_pencils.csv @@ -0,0 +1,2 @@ +videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps +50,x,x,unitree_z1_dual_arm_cleanup_pencils,clean up eraser and pencils,x,x,x,Z1_Dual_Dex1,30 diff --git a/unitree_z1_dual_arm_cleanup_pencils/case3/run_world_model_interaction.sh b/unitree_z1_dual_arm_cleanup_pencils/case3/run_world_model_interaction.sh new file mode 100644 index 0000000..78c56d7 --- /dev/null +++ b/unitree_z1_dual_arm_cleanup_pencils/case3/run_world_model_interaction.sh @@ -0,0 +1,24 @@ +res_dir="unitree_z1_dual_arm_cleanup_pencils/case3" +dataset="unitree_z1_dual_arm_cleanup_pencils" + +{ + time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \ + --seed 123 \ + --ckpt_path ckpts/unifolm_wma_dual.ckpt \ + --config configs/inference/world_model_interaction.yaml \ + --savedir "${res_dir}/output" \ + --bs 1 --height 320 --width 512 \ + --unconditional_guidance_scale 1.0 \ + --ddim_steps 50 \ + --ddim_eta 1.0 \ + --prompt_dir "unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts" \ + --dataset ${dataset} \ + --video_length 16 \ + --frame_stride 4 \ + --n_action_steps 16 \ + --exe_steps 16 \ + --n_iter 8 \ + --timestep_spacing 'uniform_trailing' \ + --guidance_rescale 0.7 \ + --perframe_ae +} 2>&1 | tee "${res_dir}/output.log" diff --git a/unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts/images/unitree_z1_dual_arm_cleanup_pencils/100.png b/unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts/images/unitree_z1_dual_arm_cleanup_pencils/100.png new file mode 100644 index 0000000..7cc656f Binary files /dev/null and b/unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts/images/unitree_z1_dual_arm_cleanup_pencils/100.png differ diff --git a/unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/100.h5 b/unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/100.h5 new file mode 100644 index 0000000..185d89b Binary files /dev/null and b/unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/100.h5 differ diff --git a/unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/meta_data/stats.safetensors b/unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/meta_data/stats.safetensors new file mode 100644 index 0000000..e3194ab Binary files /dev/null and b/unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/meta_data/stats.safetensors differ diff --git a/unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts/unitree_z1_dual_arm_cleanup_pencils.csv b/unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts/unitree_z1_dual_arm_cleanup_pencils.csv new file mode 100644 index 0000000..3462452 --- /dev/null +++ b/unitree_z1_dual_arm_cleanup_pencils/case3/world_model_interaction_prompts/unitree_z1_dual_arm_cleanup_pencils.csv @@ -0,0 +1,2 @@ +videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps +100,x,x,unitree_z1_dual_arm_cleanup_pencils,clean up eraser and pencils,x,x,x,Z1_Dual_Dex1,30 diff --git a/unitree_z1_dual_arm_cleanup_pencils/case4/run_world_model_interaction.sh b/unitree_z1_dual_arm_cleanup_pencils/case4/run_world_model_interaction.sh new file mode 100644 index 0000000..9367c09 --- /dev/null +++ b/unitree_z1_dual_arm_cleanup_pencils/case4/run_world_model_interaction.sh @@ -0,0 +1,24 @@ +res_dir="unitree_z1_dual_arm_cleanup_pencils/case4" +dataset="unitree_z1_dual_arm_cleanup_pencils" + +{ + time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \ + --seed 123 \ + --ckpt_path ckpts/unifolm_wma_dual.ckpt \ + --config configs/inference/world_model_interaction.yaml \ + --savedir "${res_dir}/output" \ + --bs 1 --height 320 --width 512 \ + --unconditional_guidance_scale 1.0 \ + --ddim_steps 50 \ + --ddim_eta 1.0 \ + --prompt_dir "unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts" \ + --dataset ${dataset} \ + --video_length 16 \ + --frame_stride 4 \ + --n_action_steps 16 \ + --exe_steps 16 \ + --n_iter 8 \ + --timestep_spacing 'uniform_trailing' \ + --guidance_rescale 0.7 \ + --perframe_ae +} 2>&1 | tee "${res_dir}/output.log" diff --git a/unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts/images/unitree_z1_dual_arm_cleanup_pencils/200.png b/unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts/images/unitree_z1_dual_arm_cleanup_pencils/200.png new file mode 100644 index 0000000..9934a16 Binary files /dev/null and b/unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts/images/unitree_z1_dual_arm_cleanup_pencils/200.png differ diff --git a/unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/200.h5 b/unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/200.h5 new file mode 100644 index 0000000..97ccecc Binary files /dev/null and b/unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/200.h5 differ diff --git a/unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/meta_data/stats.safetensors b/unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/meta_data/stats.safetensors new file mode 100644 index 0000000..e3194ab Binary files /dev/null and b/unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_cleanup_pencils/meta_data/stats.safetensors differ diff --git a/unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts/unitree_z1_dual_arm_cleanup_pencils.csv b/unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts/unitree_z1_dual_arm_cleanup_pencils.csv new file mode 100644 index 0000000..498d7f1 --- /dev/null +++ b/unitree_z1_dual_arm_cleanup_pencils/case4/world_model_interaction_prompts/unitree_z1_dual_arm_cleanup_pencils.csv @@ -0,0 +1,2 @@ +videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps +200,x,x,unitree_z1_dual_arm_cleanup_pencils,clean up eraser and pencils,x,x,x,Z1_Dual_Dex1,30 diff --git a/unitree_z1_dual_arm_stackbox/case1/run_world_model_interaction.sh b/unitree_z1_dual_arm_stackbox/case1/run_world_model_interaction.sh new file mode 100644 index 0000000..0d9ed4c --- /dev/null +++ b/unitree_z1_dual_arm_stackbox/case1/run_world_model_interaction.sh @@ -0,0 +1,24 @@ +res_dir="unitree_z1_dual_arm_stackbox/case1" +dataset="unitree_z1_dual_arm_stackbox" + +{ + time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \ + --seed 123 \ + --ckpt_path ckpts/unifolm_wma_dual.ckpt \ + --config configs/inference/world_model_interaction.yaml \ + --savedir "${res_dir}/output" \ + --bs 1 --height 320 --width 512 \ + --unconditional_guidance_scale 1.0 \ + --ddim_steps 50 \ + --ddim_eta 1.0 \ + --prompt_dir "unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts" \ + --dataset ${dataset} \ + --video_length 16 \ + --frame_stride 4 \ + --n_action_steps 16 \ + --exe_steps 16 \ + --n_iter 7 \ + --timestep_spacing 'uniform_trailing' \ + --guidance_rescale 0.7 \ + --perframe_ae +} 2>&1 | tee "${res_dir}/output.log" diff --git a/unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox/5.png b/unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox/5.png new file mode 100644 index 0000000..eb6e272 Binary files /dev/null and b/unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox/5.png differ diff --git a/unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/5.h5 b/unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/5.h5 new file mode 100644 index 0000000..af951c1 Binary files /dev/null and b/unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/5.h5 differ diff --git a/unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/meta_data/stats.safetensors b/unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/meta_data/stats.safetensors new file mode 100644 index 0000000..fa7fd40 Binary files /dev/null and b/unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/meta_data/stats.safetensors differ diff --git a/unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox.csv b/unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox.csv new file mode 100644 index 0000000..6e7f0a8 --- /dev/null +++ b/unitree_z1_dual_arm_stackbox/case1/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox.csv @@ -0,0 +1,2 @@ +videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps +5,x,x,unitree_z1_dual_arm_stackbox,"Pick up the red cup on the table.",x,x,x,Unitree Z1 Robot Dual-Arm,30 diff --git a/unitree_z1_dual_arm_stackbox/case2/run_world_model_interaction.sh b/unitree_z1_dual_arm_stackbox/case2/run_world_model_interaction.sh new file mode 100644 index 0000000..7b6d005 --- /dev/null +++ b/unitree_z1_dual_arm_stackbox/case2/run_world_model_interaction.sh @@ -0,0 +1,24 @@ +res_dir="unitree_z1_dual_arm_stackbox/case2" +dataset="unitree_z1_dual_arm_stackbox" + +{ + time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \ + --seed 123 \ + --ckpt_path ckpts/unifolm_wma_dual.ckpt \ + --config configs/inference/world_model_interaction.yaml \ + --savedir "${res_dir}/output" \ + --bs 1 --height 320 --width 512 \ + --unconditional_guidance_scale 1.0 \ + --ddim_steps 50 \ + --ddim_eta 1.0 \ + --prompt_dir "unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts" \ + --dataset ${dataset} \ + --video_length 16 \ + --frame_stride 4 \ + --n_action_steps 16 \ + --exe_steps 16 \ + --n_iter 7 \ + --timestep_spacing 'uniform_trailing' \ + --guidance_rescale 0.7 \ + --perframe_ae +} 2>&1 | tee "${res_dir}/output.log" diff --git a/unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox/15.png b/unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox/15.png new file mode 100644 index 0000000..676341b Binary files /dev/null and b/unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox/15.png differ diff --git a/unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/15.h5 b/unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/15.h5 new file mode 100644 index 0000000..bf66fa5 Binary files /dev/null and b/unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/15.h5 differ diff --git a/unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/meta_data/stats.safetensors b/unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/meta_data/stats.safetensors new file mode 100644 index 0000000..fa7fd40 Binary files /dev/null and b/unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/meta_data/stats.safetensors differ diff --git a/unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox.csv b/unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox.csv new file mode 100644 index 0000000..79f4f8c --- /dev/null +++ b/unitree_z1_dual_arm_stackbox/case2/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox.csv @@ -0,0 +1,2 @@ +videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps +15,x,x,unitree_z1_dual_arm_stackbox,"Pick up the red cup on the table.",x,x,x,Unitree Z1 Robot Dual-Arm,30 diff --git a/unitree_z1_dual_arm_stackbox/case3/run_world_model_interaction.sh b/unitree_z1_dual_arm_stackbox/case3/run_world_model_interaction.sh new file mode 100644 index 0000000..1058f25 --- /dev/null +++ b/unitree_z1_dual_arm_stackbox/case3/run_world_model_interaction.sh @@ -0,0 +1,24 @@ +res_dir="unitree_z1_dual_arm_stackbox/case3" +dataset="unitree_z1_dual_arm_stackbox" + +{ + time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \ + --seed 123 \ + --ckpt_path ckpts/unifolm_wma_dual.ckpt \ + --config configs/inference/world_model_interaction.yaml \ + --savedir "${res_dir}/output" \ + --bs 1 --height 320 --width 512 \ + --unconditional_guidance_scale 1.0 \ + --ddim_steps 50 \ + --ddim_eta 1.0 \ + --prompt_dir "unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts" \ + --dataset ${dataset} \ + --video_length 16 \ + --frame_stride 4 \ + --n_action_steps 16 \ + --exe_steps 16 \ + --n_iter 7 \ + --timestep_spacing 'uniform_trailing' \ + --guidance_rescale 0.7 \ + --perframe_ae +} 2>&1 | tee "${res_dir}/output.log" diff --git a/unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox/25.png b/unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox/25.png new file mode 100644 index 0000000..5540f09 Binary files /dev/null and b/unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox/25.png differ diff --git a/unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/25.h5 b/unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/25.h5 new file mode 100644 index 0000000..8a6ca42 Binary files /dev/null and b/unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/25.h5 differ diff --git a/unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/meta_data/stats.safetensors b/unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/meta_data/stats.safetensors new file mode 100644 index 0000000..fa7fd40 Binary files /dev/null and b/unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/meta_data/stats.safetensors differ diff --git a/unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox.csv b/unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox.csv new file mode 100644 index 0000000..3bbd2da --- /dev/null +++ b/unitree_z1_dual_arm_stackbox/case3/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox.csv @@ -0,0 +1,2 @@ +videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps +25,x,x,unitree_z1_dual_arm_stackbox,"Pick up the red cup on the table.",x,x,x,Unitree Z1 Robot Dual-Arm,30 diff --git a/unitree_z1_dual_arm_stackbox/case4/run_world_model_interaction.sh b/unitree_z1_dual_arm_stackbox/case4/run_world_model_interaction.sh new file mode 100644 index 0000000..fa46100 --- /dev/null +++ b/unitree_z1_dual_arm_stackbox/case4/run_world_model_interaction.sh @@ -0,0 +1,24 @@ +res_dir="unitree_z1_dual_arm_stackbox/case4" +dataset="unitree_z1_dual_arm_stackbox" + +{ + time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \ + --seed 123 \ + --ckpt_path ckpts/unifolm_wma_dual.ckpt \ + --config configs/inference/world_model_interaction.yaml \ + --savedir "${res_dir}/output" \ + --bs 1 --height 320 --width 512 \ + --unconditional_guidance_scale 1.0 \ + --ddim_steps 50 \ + --ddim_eta 1.0 \ + --prompt_dir "unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts" \ + --dataset ${dataset} \ + --video_length 16 \ + --frame_stride 4 \ + --n_action_steps 16 \ + --exe_steps 16 \ + --n_iter 7 \ + --timestep_spacing 'uniform_trailing' \ + --guidance_rescale 0.7 \ + --perframe_ae +} 2>&1 | tee "${res_dir}/output.log" diff --git a/unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox/35.png b/unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox/35.png new file mode 100644 index 0000000..f3ec0a3 Binary files /dev/null and b/unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox/35.png differ diff --git a/unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/35.h5 b/unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/35.h5 new file mode 100644 index 0000000..875155b Binary files /dev/null and b/unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/35.h5 differ diff --git a/unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/meta_data/stats.safetensors b/unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/meta_data/stats.safetensors new file mode 100644 index 0000000..fa7fd40 Binary files /dev/null and b/unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox/meta_data/stats.safetensors differ diff --git a/unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox.csv b/unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox.csv new file mode 100644 index 0000000..f22144c --- /dev/null +++ b/unitree_z1_dual_arm_stackbox/case4/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox.csv @@ -0,0 +1,2 @@ +videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps +35,x,x,unitree_z1_dual_arm_stackbox,"Pick up the red cup on the table.",x,x,x,Unitree Z1 Robot Dual-Arm,30 diff --git a/unitree_z1_dual_arm_stackbox_v2/case1/run_world_model_interaction.sh b/unitree_z1_dual_arm_stackbox_v2/case1/run_world_model_interaction.sh new file mode 100644 index 0000000..bdcbbff --- /dev/null +++ b/unitree_z1_dual_arm_stackbox_v2/case1/run_world_model_interaction.sh @@ -0,0 +1,24 @@ +res_dir="unitree_z1_dual_arm_stackbox_v2/case1" +dataset="unitree_z1_dual_arm_stackbox_v2" + +{ + time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \ + --seed 123 \ + --ckpt_path ckpts/unifolm_wma_dual.ckpt \ + --config configs/inference/world_model_interaction.yaml \ + --savedir "${res_dir}/output" \ + --bs 1 --height 320 --width 512 \ + --unconditional_guidance_scale 1.0 \ + --ddim_steps 50 \ + --ddim_eta 1.0 \ + --prompt_dir "unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts" \ + --dataset ${dataset} \ + --video_length 16 \ + --frame_stride 4 \ + --n_action_steps 16 \ + --exe_steps 16 \ + --n_iter 11 \ + --timestep_spacing 'uniform_trailing' \ + --guidance_rescale 0.7 \ + --perframe_ae +} 2>&1 | tee "${res_dir}/output.log" diff --git a/unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox_v2/5.png b/unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox_v2/5.png new file mode 100644 index 0000000..2371c4d Binary files /dev/null and b/unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox_v2/5.png differ diff --git a/unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/5.h5 b/unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/5.h5 new file mode 100644 index 0000000..a999fc7 Binary files /dev/null and b/unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/5.h5 differ diff --git a/unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/meta_data/stats.safetensors b/unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/meta_data/stats.safetensors new file mode 100644 index 0000000..6ef7a6c Binary files /dev/null and b/unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/meta_data/stats.safetensors differ diff --git a/unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox_v2.csv b/unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox_v2.csv new file mode 100644 index 0000000..4591e75 --- /dev/null +++ b/unitree_z1_dual_arm_stackbox_v2/case1/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox_v2.csv @@ -0,0 +1,2 @@ +videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps +5,x,x,unitree_z1_dual_arm_stackbox_v2,"Stack the blocks in the rectangular block: red at the bottom, yellow in the middle, green on top",x,x,x,Unitree Z1 Robot Dual-Arm,30 diff --git a/unitree_z1_dual_arm_stackbox_v2/case2/run_world_model_interaction.sh b/unitree_z1_dual_arm_stackbox_v2/case2/run_world_model_interaction.sh new file mode 100644 index 0000000..2c94946 --- /dev/null +++ b/unitree_z1_dual_arm_stackbox_v2/case2/run_world_model_interaction.sh @@ -0,0 +1,24 @@ +res_dir="unitree_z1_dual_arm_stackbox_v2/case2" +dataset="unitree_z1_dual_arm_stackbox_v2" + +{ + time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \ + --seed 123 \ + --ckpt_path ckpts/unifolm_wma_dual.ckpt \ + --config configs/inference/world_model_interaction.yaml \ + --savedir "${res_dir}/output" \ + --bs 1 --height 320 --width 512 \ + --unconditional_guidance_scale 1.0 \ + --ddim_steps 50 \ + --ddim_eta 1.0 \ + --prompt_dir "unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts" \ + --dataset ${dataset} \ + --video_length 16 \ + --frame_stride 4 \ + --n_action_steps 16 \ + --exe_steps 16 \ + --n_iter 11 \ + --timestep_spacing 'uniform_trailing' \ + --guidance_rescale 0.7 \ + --perframe_ae +} 2>&1 | tee "${res_dir}/output.log" diff --git a/unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox_v2/15.png b/unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox_v2/15.png new file mode 100644 index 0000000..aab83f1 Binary files /dev/null and b/unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox_v2/15.png differ diff --git a/unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/15.h5 b/unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/15.h5 new file mode 100644 index 0000000..0a6bb8f Binary files /dev/null and b/unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/15.h5 differ diff --git a/unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/meta_data/stats.safetensors b/unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/meta_data/stats.safetensors new file mode 100644 index 0000000..6ef7a6c Binary files /dev/null and b/unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/meta_data/stats.safetensors differ diff --git a/unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox_v2.csv b/unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox_v2.csv new file mode 100644 index 0000000..8cc81d4 --- /dev/null +++ b/unitree_z1_dual_arm_stackbox_v2/case2/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox_v2.csv @@ -0,0 +1,2 @@ +videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps +15,x,x,unitree_z1_dual_arm_stackbox_v2,"Stack the blocks in the rectangular block: red at the bottom, yellow in the middle, green on top",x,x,x,Unitree Z1 Robot Dual-Arm,30 diff --git a/unitree_z1_dual_arm_stackbox_v2/case3/run_world_model_interaction.sh b/unitree_z1_dual_arm_stackbox_v2/case3/run_world_model_interaction.sh new file mode 100644 index 0000000..6708ee9 --- /dev/null +++ b/unitree_z1_dual_arm_stackbox_v2/case3/run_world_model_interaction.sh @@ -0,0 +1,24 @@ +res_dir="unitree_z1_dual_arm_stackbox_v2/case3" +dataset="unitree_z1_dual_arm_stackbox_v2" + +{ + time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \ + --seed 123 \ + --ckpt_path ckpts/unifolm_wma_dual.ckpt \ + --config configs/inference/world_model_interaction.yaml \ + --savedir "${res_dir}/output" \ + --bs 1 --height 320 --width 512 \ + --unconditional_guidance_scale 1.0 \ + --ddim_steps 50 \ + --ddim_eta 1.0 \ + --prompt_dir "unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts" \ + --dataset ${dataset} \ + --video_length 16 \ + --frame_stride 4 \ + --n_action_steps 16 \ + --exe_steps 16 \ + --n_iter 11 \ + --timestep_spacing 'uniform_trailing' \ + --guidance_rescale 0.7 \ + --perframe_ae +} 2>&1 | tee "${res_dir}/output.log" diff --git a/unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox_v2/25.png b/unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox_v2/25.png new file mode 100644 index 0000000..f800036 Binary files /dev/null and b/unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox_v2/25.png differ diff --git a/unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/25.h5 b/unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/25.h5 new file mode 100644 index 0000000..966e7cc Binary files /dev/null and b/unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/25.h5 differ diff --git a/unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/meta_data/stats.safetensors b/unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/meta_data/stats.safetensors new file mode 100644 index 0000000..6ef7a6c Binary files /dev/null and b/unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/meta_data/stats.safetensors differ diff --git a/unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox_v2.csv b/unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox_v2.csv new file mode 100644 index 0000000..4e1d4ee --- /dev/null +++ b/unitree_z1_dual_arm_stackbox_v2/case3/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox_v2.csv @@ -0,0 +1,2 @@ +videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps +25,x,x,unitree_z1_dual_arm_stackbox_v2,"Stack the blocks in the rectangular block: red at the bottom, yellow in the middle, green on top",x,x,x,Unitree Z1 Robot Dual-Arm,30 diff --git a/unitree_z1_dual_arm_stackbox_v2/case4/run_world_model_interaction.sh b/unitree_z1_dual_arm_stackbox_v2/case4/run_world_model_interaction.sh new file mode 100644 index 0000000..370c1c3 --- /dev/null +++ b/unitree_z1_dual_arm_stackbox_v2/case4/run_world_model_interaction.sh @@ -0,0 +1,24 @@ +res_dir="unitree_z1_dual_arm_stackbox_v2/case4" +dataset="unitree_z1_dual_arm_stackbox_v2" + +{ + time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \ + --seed 123 \ + --ckpt_path ckpts/unifolm_wma_dual.ckpt \ + --config configs/inference/world_model_interaction.yaml \ + --savedir "${res_dir}/output" \ + --bs 1 --height 320 --width 512 \ + --unconditional_guidance_scale 1.0 \ + --ddim_steps 50 \ + --ddim_eta 1.0 \ + --prompt_dir "unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts" \ + --dataset ${dataset} \ + --video_length 16 \ + --frame_stride 4 \ + --n_action_steps 16 \ + --exe_steps 16 \ + --n_iter 11 \ + --timestep_spacing 'uniform_trailing' \ + --guidance_rescale 0.7 \ + --perframe_ae +} 2>&1 | tee "${res_dir}/output.log" diff --git a/unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox_v2/35.png b/unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox_v2/35.png new file mode 100644 index 0000000..d760f72 Binary files /dev/null and b/unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts/images/unitree_z1_dual_arm_stackbox_v2/35.png differ diff --git a/unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/35.h5 b/unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/35.h5 new file mode 100644 index 0000000..d9adda8 Binary files /dev/null and b/unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/35.h5 differ diff --git a/unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/meta_data/stats.safetensors b/unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/meta_data/stats.safetensors new file mode 100644 index 0000000..6ef7a6c Binary files /dev/null and b/unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts/transitions/unitree_z1_dual_arm_stackbox_v2/meta_data/stats.safetensors differ diff --git a/unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox_v2.csv b/unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox_v2.csv new file mode 100644 index 0000000..43c4b92 --- /dev/null +++ b/unitree_z1_dual_arm_stackbox_v2/case4/world_model_interaction_prompts/unitree_z1_dual_arm_stackbox_v2.csv @@ -0,0 +1,2 @@ +videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps +35,x,x,unitree_z1_dual_arm_stackbox_v2,"Stack the blocks in the rectangular block: red at the bottom, yellow in the middle, green on top",x,x,x,Unitree Z1 Robot Dual-Arm,30 diff --git a/unitree_z1_stackbox/case1/run_world_model_interaction.sh b/unitree_z1_stackbox/case1/run_world_model_interaction.sh new file mode 100644 index 0000000..73d9132 --- /dev/null +++ b/unitree_z1_stackbox/case1/run_world_model_interaction.sh @@ -0,0 +1,24 @@ +res_dir="unitree_z1_stackbox/case1" +dataset="unitree_z1_stackbox" + +{ + time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \ + --seed 123 \ + --ckpt_path ckpts/unifolm_wma_dual.ckpt \ + --config configs/inference/world_model_interaction.yaml \ + --savedir "${res_dir}/output" \ + --bs 1 --height 320 --width 512 \ + --unconditional_guidance_scale 1.0 \ + --ddim_steps 50 \ + --ddim_eta 1.0 \ + --prompt_dir "unitree_z1_stackbox/case1/world_model_interaction_prompts" \ + --dataset ${dataset} \ + --video_length 16 \ + --frame_stride 4 \ + --n_action_steps 16 \ + --exe_steps 16 \ + --n_iter 12 \ + --timestep_spacing 'uniform_trailing' \ + --guidance_rescale 0.7 \ + --perframe_ae +} 2>&1 | tee "${res_dir}/output.log" diff --git a/unitree_z1_stackbox/case1/world_model_interaction_prompts/images/unitree_z1_stackbox/5.png b/unitree_z1_stackbox/case1/world_model_interaction_prompts/images/unitree_z1_stackbox/5.png new file mode 100644 index 0000000..8e265c0 Binary files /dev/null and b/unitree_z1_stackbox/case1/world_model_interaction_prompts/images/unitree_z1_stackbox/5.png differ diff --git a/unitree_z1_stackbox/case1/world_model_interaction_prompts/transitions/unitree_z1_stackbox/5.h5 b/unitree_z1_stackbox/case1/world_model_interaction_prompts/transitions/unitree_z1_stackbox/5.h5 new file mode 100644 index 0000000..fa647f1 Binary files /dev/null and b/unitree_z1_stackbox/case1/world_model_interaction_prompts/transitions/unitree_z1_stackbox/5.h5 differ diff --git a/unitree_z1_stackbox/case1/world_model_interaction_prompts/transitions/unitree_z1_stackbox/meta_data/stats.safetensors b/unitree_z1_stackbox/case1/world_model_interaction_prompts/transitions/unitree_z1_stackbox/meta_data/stats.safetensors new file mode 100644 index 0000000..1918ea0 Binary files /dev/null and b/unitree_z1_stackbox/case1/world_model_interaction_prompts/transitions/unitree_z1_stackbox/meta_data/stats.safetensors differ diff --git a/unitree_z1_stackbox/case1/world_model_interaction_prompts/unitree_z1_stackbox.csv b/unitree_z1_stackbox/case1/world_model_interaction_prompts/unitree_z1_stackbox.csv new file mode 100644 index 0000000..8f55185 --- /dev/null +++ b/unitree_z1_stackbox/case1/world_model_interaction_prompts/unitree_z1_stackbox.csv @@ -0,0 +1,2 @@ +videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps +5,x,x,unitree_z1_stackbox,"Pick up the red cup on the table.",x,x,x,Unitree Z1 Robot Arm,30 diff --git a/unitree_z1_stackbox/case2/run_world_model_interaction.sh b/unitree_z1_stackbox/case2/run_world_model_interaction.sh new file mode 100644 index 0000000..95fb33b --- /dev/null +++ b/unitree_z1_stackbox/case2/run_world_model_interaction.sh @@ -0,0 +1,24 @@ +res_dir="unitree_z1_stackbox/case2" +dataset="unitree_z1_stackbox" + +{ + time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \ + --seed 123 \ + --ckpt_path ckpts/unifolm_wma_dual.ckpt \ + --config configs/inference/world_model_interaction.yaml \ + --savedir "${res_dir}/output" \ + --bs 1 --height 320 --width 512 \ + --unconditional_guidance_scale 1.0 \ + --ddim_steps 50 \ + --ddim_eta 1.0 \ + --prompt_dir "unitree_z1_stackbox/case2/world_model_interaction_prompts" \ + --dataset ${dataset} \ + --video_length 16 \ + --frame_stride 4 \ + --n_action_steps 16 \ + --exe_steps 16 \ + --n_iter 12 \ + --timestep_spacing 'uniform_trailing' \ + --guidance_rescale 0.7 \ + --perframe_ae +} 2>&1 | tee "${res_dir}/output.log" diff --git a/unitree_z1_stackbox/case2/world_model_interaction_prompts/images/unitree_z1_stackbox/15.png b/unitree_z1_stackbox/case2/world_model_interaction_prompts/images/unitree_z1_stackbox/15.png new file mode 100644 index 0000000..2b7be22 Binary files /dev/null and b/unitree_z1_stackbox/case2/world_model_interaction_prompts/images/unitree_z1_stackbox/15.png differ diff --git a/unitree_z1_stackbox/case2/world_model_interaction_prompts/transitions/unitree_z1_stackbox/15.h5 b/unitree_z1_stackbox/case2/world_model_interaction_prompts/transitions/unitree_z1_stackbox/15.h5 new file mode 100644 index 0000000..4a71e9f Binary files /dev/null and b/unitree_z1_stackbox/case2/world_model_interaction_prompts/transitions/unitree_z1_stackbox/15.h5 differ diff --git a/unitree_z1_stackbox/case2/world_model_interaction_prompts/transitions/unitree_z1_stackbox/meta_data/stats.safetensors b/unitree_z1_stackbox/case2/world_model_interaction_prompts/transitions/unitree_z1_stackbox/meta_data/stats.safetensors new file mode 100644 index 0000000..1918ea0 Binary files /dev/null and b/unitree_z1_stackbox/case2/world_model_interaction_prompts/transitions/unitree_z1_stackbox/meta_data/stats.safetensors differ diff --git a/unitree_z1_stackbox/case2/world_model_interaction_prompts/unitree_z1_stackbox.csv b/unitree_z1_stackbox/case2/world_model_interaction_prompts/unitree_z1_stackbox.csv new file mode 100644 index 0000000..bde4468 --- /dev/null +++ b/unitree_z1_stackbox/case2/world_model_interaction_prompts/unitree_z1_stackbox.csv @@ -0,0 +1,2 @@ +videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps +15,x,x,unitree_z1_stackbox,"Pick up the red cup on the table.",x,x,x,Unitree Z1 Robot Arm,30 diff --git a/unitree_z1_stackbox/case3/run_world_model_interaction.sh b/unitree_z1_stackbox/case3/run_world_model_interaction.sh new file mode 100644 index 0000000..d92501c --- /dev/null +++ b/unitree_z1_stackbox/case3/run_world_model_interaction.sh @@ -0,0 +1,24 @@ +res_dir="unitree_z1_stackbox/case3" +dataset="unitree_z1_stackbox" + +{ + time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \ + --seed 123 \ + --ckpt_path ckpts/unifolm_wma_dual.ckpt \ + --config configs/inference/world_model_interaction.yaml \ + --savedir "${res_dir}/output" \ + --bs 1 --height 320 --width 512 \ + --unconditional_guidance_scale 1.0 \ + --ddim_steps 50 \ + --ddim_eta 1.0 \ + --prompt_dir "unitree_z1_stackbox/case3/world_model_interaction_prompts" \ + --dataset ${dataset} \ + --video_length 16 \ + --frame_stride 4 \ + --n_action_steps 16 \ + --exe_steps 16 \ + --n_iter 12 \ + --timestep_spacing 'uniform_trailing' \ + --guidance_rescale 0.7 \ + --perframe_ae +} 2>&1 | tee "${res_dir}/output.log" diff --git a/unitree_z1_stackbox/case3/world_model_interaction_prompts/images/unitree_z1_stackbox/25.png b/unitree_z1_stackbox/case3/world_model_interaction_prompts/images/unitree_z1_stackbox/25.png new file mode 100644 index 0000000..1365fd5 Binary files /dev/null and b/unitree_z1_stackbox/case3/world_model_interaction_prompts/images/unitree_z1_stackbox/25.png differ diff --git a/unitree_z1_stackbox/case3/world_model_interaction_prompts/transitions/unitree_z1_stackbox/25.h5 b/unitree_z1_stackbox/case3/world_model_interaction_prompts/transitions/unitree_z1_stackbox/25.h5 new file mode 100644 index 0000000..27c0773 Binary files /dev/null and b/unitree_z1_stackbox/case3/world_model_interaction_prompts/transitions/unitree_z1_stackbox/25.h5 differ diff --git a/unitree_z1_stackbox/case3/world_model_interaction_prompts/transitions/unitree_z1_stackbox/meta_data/stats.safetensors b/unitree_z1_stackbox/case3/world_model_interaction_prompts/transitions/unitree_z1_stackbox/meta_data/stats.safetensors new file mode 100644 index 0000000..1918ea0 Binary files /dev/null and b/unitree_z1_stackbox/case3/world_model_interaction_prompts/transitions/unitree_z1_stackbox/meta_data/stats.safetensors differ diff --git a/unitree_z1_stackbox/case3/world_model_interaction_prompts/unitree_z1_stackbox.csv b/unitree_z1_stackbox/case3/world_model_interaction_prompts/unitree_z1_stackbox.csv new file mode 100644 index 0000000..a32f631 --- /dev/null +++ b/unitree_z1_stackbox/case3/world_model_interaction_prompts/unitree_z1_stackbox.csv @@ -0,0 +1,2 @@ +videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps +25,x,x,unitree_z1_stackbox,"Pick up the red cup on the table.",x,x,x,Unitree Z1 Robot Arm,30 diff --git a/unitree_z1_stackbox/case4/run_world_model_interaction.sh b/unitree_z1_stackbox/case4/run_world_model_interaction.sh new file mode 100644 index 0000000..054b175 --- /dev/null +++ b/unitree_z1_stackbox/case4/run_world_model_interaction.sh @@ -0,0 +1,24 @@ +res_dir="unitree_z1_stackbox/case4" +dataset="unitree_z1_stackbox" + +{ + time CUDA_VISIBLE_DEVICES=0 python3 scripts/evaluation/world_model_interaction.py \ + --seed 123 \ + --ckpt_path ckpts/unifolm_wma_dual.ckpt \ + --config configs/inference/world_model_interaction.yaml \ + --savedir "${res_dir}/output" \ + --bs 1 --height 320 --width 512 \ + --unconditional_guidance_scale 1.0 \ + --ddim_steps 50 \ + --ddim_eta 1.0 \ + --prompt_dir "unitree_z1_stackbox/case4/world_model_interaction_prompts" \ + --dataset ${dataset} \ + --video_length 16 \ + --frame_stride 4 \ + --n_action_steps 16 \ + --exe_steps 16 \ + --n_iter 12 \ + --timestep_spacing 'uniform_trailing' \ + --guidance_rescale 0.7 \ + --perframe_ae +} 2>&1 | tee "${res_dir}/output.log" diff --git a/unitree_z1_stackbox/case4/world_model_interaction_prompts/images/unitree_z1_stackbox/35.png b/unitree_z1_stackbox/case4/world_model_interaction_prompts/images/unitree_z1_stackbox/35.png new file mode 100644 index 0000000..67736af Binary files /dev/null and b/unitree_z1_stackbox/case4/world_model_interaction_prompts/images/unitree_z1_stackbox/35.png differ diff --git a/unitree_z1_stackbox/case4/world_model_interaction_prompts/transitions/unitree_z1_stackbox/35.h5 b/unitree_z1_stackbox/case4/world_model_interaction_prompts/transitions/unitree_z1_stackbox/35.h5 new file mode 100644 index 0000000..94322f7 Binary files /dev/null and b/unitree_z1_stackbox/case4/world_model_interaction_prompts/transitions/unitree_z1_stackbox/35.h5 differ diff --git a/unitree_z1_stackbox/case4/world_model_interaction_prompts/transitions/unitree_z1_stackbox/meta_data/stats.safetensors b/unitree_z1_stackbox/case4/world_model_interaction_prompts/transitions/unitree_z1_stackbox/meta_data/stats.safetensors new file mode 100644 index 0000000..1918ea0 Binary files /dev/null and b/unitree_z1_stackbox/case4/world_model_interaction_prompts/transitions/unitree_z1_stackbox/meta_data/stats.safetensors differ diff --git a/unitree_z1_stackbox/case4/world_model_interaction_prompts/unitree_z1_stackbox.csv b/unitree_z1_stackbox/case4/world_model_interaction_prompts/unitree_z1_stackbox.csv new file mode 100644 index 0000000..2f0bbc0 --- /dev/null +++ b/unitree_z1_stackbox/case4/world_model_interaction_prompts/unitree_z1_stackbox.csv @@ -0,0 +1,2 @@ +videoid,contentUrl,duration,data_dir,instruction,dynamic_confidence,dynamic_wording,dynamic_source_category,embodiment,fps +35,x,x,unitree_z1_stackbox,"Pick up the red cup on the table.",x,x,x,Unitree Z1 Robot Arm,30