Compare commits
3 Commits
ef56e5dcdb
...
qhy5
| Author | SHA1 | Date | |
|---|---|---|---|
| 5863fbb656 | |||
| d9d9537d33 | |||
| 202062a647 |
@@ -1,21 +0,0 @@
|
|||||||
{
|
|
||||||
"permissions": {
|
|
||||||
"allow": [
|
|
||||||
"Bash(conda env list:*)",
|
|
||||||
"Bash(mamba env:*)",
|
|
||||||
"Bash(micromamba env list:*)",
|
|
||||||
"Bash(echo:*)",
|
|
||||||
"Bash(git show:*)",
|
|
||||||
"Bash(nvidia-smi:*)",
|
|
||||||
"Bash(conda activate unifolm-wma)",
|
|
||||||
"Bash(conda info:*)",
|
|
||||||
"Bash(direnv allow:*)",
|
|
||||||
"Bash(ls:*)",
|
|
||||||
"Bash(for scenario in unitree_g1_pack_camera unitree_z1_dual_arm_cleanup_pencils unitree_z1_dual_arm_stackbox unitree_z1_dual_arm_stackbox_v2 unitree_z1_stackbox)",
|
|
||||||
"Bash(do for case in case1 case2 case3 case4)",
|
|
||||||
"Bash(done)",
|
|
||||||
"Bash(chmod:*)",
|
|
||||||
"Bash(ln:*)"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
2
.envrc
2
.envrc
@@ -1,2 +0,0 @@
|
|||||||
eval "$(conda shell.bash hook 2>/dev/null)"
|
|
||||||
conda activate unifolm-wma
|
|
||||||
7
.gitignore
vendored
7
.gitignore
vendored
@@ -55,6 +55,7 @@ coverage.xml
|
|||||||
*.pot
|
*.pot
|
||||||
|
|
||||||
# Django stuff:
|
# Django stuff:
|
||||||
|
|
||||||
local_settings.py
|
local_settings.py
|
||||||
db.sqlite3
|
db.sqlite3
|
||||||
|
|
||||||
@@ -120,15 +121,11 @@ localTest/
|
|||||||
fig/
|
fig/
|
||||||
figure/
|
figure/
|
||||||
*.mp4
|
*.mp4
|
||||||
|
|
||||||
Data/ControlVAE.yml
|
Data/ControlVAE.yml
|
||||||
Data/Misc
|
Data/Misc
|
||||||
Data/Pretrained
|
Data/Pretrained
|
||||||
Data/utils.py
|
Data/utils.py
|
||||||
Experiment/checkpoint
|
Experiment/checkpoint
|
||||||
Experiment/log
|
Experiment/log
|
||||||
|
ckpts/unifolm_wma_dual.ckpt
|
||||||
*.ckpt
|
|
||||||
|
|
||||||
*.0
|
*.0
|
||||||
ckpts/unifolm_wma_dual.ckpt.prepared.pt
|
|
||||||
|
|||||||
439
ckpts/LICENSE
439
ckpts/LICENSE
@@ -1,439 +0,0 @@
|
|||||||
Attribution-NonCommercial-ShareAlike 4.0 International
|
|
||||||
|
|
||||||
Copyright (c) 2016-2025 HangZhou YuShu TECHNOLOGY CO.,LTD. ("Unitree Robotics")
|
|
||||||
|
|
||||||
=======================================================================
|
|
||||||
|
|
||||||
Creative Commons Corporation ("Creative Commons") is not a law firm and
|
|
||||||
does not provide legal services or legal advice. Distribution of
|
|
||||||
Creative Commons public licenses does not create a lawyer-client or
|
|
||||||
other relationship. Creative Commons makes its licenses and related
|
|
||||||
information available on an "as-is" basis. Creative Commons gives no
|
|
||||||
warranties regarding its licenses, any material licensed under their
|
|
||||||
terms and conditions, or any related information. Creative Commons
|
|
||||||
disclaims all liability for damages resulting from their use to the
|
|
||||||
fullest extent possible.
|
|
||||||
|
|
||||||
Using Creative Commons Public Licenses
|
|
||||||
|
|
||||||
Creative Commons public licenses provide a standard set of terms and
|
|
||||||
conditions that creators and other rights holders may use to share
|
|
||||||
original works of authorship and other material subject to copyright
|
|
||||||
and certain other rights specified in the public license below. The
|
|
||||||
following considerations are for informational purposes only, are not
|
|
||||||
exhaustive, and do not form part of our licenses.
|
|
||||||
|
|
||||||
Considerations for licensors: Our public licenses are
|
|
||||||
intended for use by those authorized to give the public
|
|
||||||
permission to use material in ways otherwise restricted by
|
|
||||||
copyright and certain other rights. Our licenses are
|
|
||||||
irrevocable. Licensors should read and understand the terms
|
|
||||||
and conditions of the license they choose before applying it.
|
|
||||||
Licensors should also secure all rights necessary before
|
|
||||||
applying our licenses so that the public can reuse the
|
|
||||||
material as expected. Licensors should clearly mark any
|
|
||||||
material not subject to the license. This includes other CC-
|
|
||||||
licensed material, or material used under an exception or
|
|
||||||
limitation to copyright. More considerations for licensors:
|
|
||||||
wiki.creativecommons.org/Considerations_for_licensors
|
|
||||||
|
|
||||||
Considerations for the public: By using one of our public
|
|
||||||
licenses, a licensor grants the public permission to use the
|
|
||||||
licensed material under specified terms and conditions. If
|
|
||||||
the licensor's permission is not necessary for any reason--for
|
|
||||||
example, because of any applicable exception or limitation to
|
|
||||||
copyright--then that use is not regulated by the license. Our
|
|
||||||
licenses grant only permissions under copyright and certain
|
|
||||||
other rights that a licensor has authority to grant. Use of
|
|
||||||
the licensed material may still be restricted for other
|
|
||||||
reasons, including because others have copyright or other
|
|
||||||
rights in the material. A licensor may make special requests,
|
|
||||||
such as asking that all changes be marked or described.
|
|
||||||
Although not required by our licenses, you are encouraged to
|
|
||||||
respect those requests where reasonable. More considerations
|
|
||||||
for the public:
|
|
||||||
wiki.creativecommons.org/Considerations_for_licensees
|
|
||||||
|
|
||||||
=======================================================================
|
|
||||||
|
|
||||||
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International
|
|
||||||
Public License
|
|
||||||
|
|
||||||
By exercising the Licensed Rights (defined below), You accept and agree
|
|
||||||
to be bound by the terms and conditions of this Creative Commons
|
|
||||||
Attribution-NonCommercial-ShareAlike 4.0 International Public License
|
|
||||||
("Public License"). To the extent this Public License may be
|
|
||||||
interpreted as a contract, You are granted the Licensed Rights in
|
|
||||||
consideration of Your acceptance of these terms and conditions, and the
|
|
||||||
Licensor grants You such rights in consideration of benefits the
|
|
||||||
Licensor receives from making the Licensed Material available under
|
|
||||||
these terms and conditions.
|
|
||||||
|
|
||||||
|
|
||||||
Section 1 -- Definitions.
|
|
||||||
|
|
||||||
a. Adapted Material means material subject to Copyright and Similar
|
|
||||||
Rights that is derived from or based upon the Licensed Material
|
|
||||||
and in which the Licensed Material is translated, altered,
|
|
||||||
arranged, transformed, or otherwise modified in a manner requiring
|
|
||||||
permission under the Copyright and Similar Rights held by the
|
|
||||||
Licensor. For purposes of this Public License, where the Licensed
|
|
||||||
Material is a musical work, performance, or sound recording,
|
|
||||||
Adapted Material is always produced where the Licensed Material is
|
|
||||||
synched in timed relation with a moving image.
|
|
||||||
|
|
||||||
b. Adapter's License means the license You apply to Your Copyright
|
|
||||||
and Similar Rights in Your contributions to Adapted Material in
|
|
||||||
accordance with the terms and conditions of this Public License.
|
|
||||||
|
|
||||||
c. BY-NC-SA Compatible License means a license listed at
|
|
||||||
creativecommons.org/compatiblelicenses, approved by Creative
|
|
||||||
Commons as essentially the equivalent of this Public License.
|
|
||||||
|
|
||||||
d. Copyright and Similar Rights means copyright and/or similar rights
|
|
||||||
closely related to copyright including, without limitation,
|
|
||||||
performance, broadcast, sound recording, and Sui Generis Database
|
|
||||||
Rights, without regard to how the rights are labeled or
|
|
||||||
categorized. For purposes of this Public License, the rights
|
|
||||||
specified in Section 2(b)(1)-(2) are not Copyright and Similar
|
|
||||||
Rights.
|
|
||||||
|
|
||||||
e. Effective Technological Measures means those measures that, in the
|
|
||||||
absence of proper authority, may not be circumvented under laws
|
|
||||||
fulfilling obligations under Article 11 of the WIPO Copyright
|
|
||||||
Treaty adopted on December 20, 1996, and/or similar international
|
|
||||||
agreements.
|
|
||||||
|
|
||||||
f. Exceptions and Limitations means fair use, fair dealing, and/or
|
|
||||||
any other exception or limitation to Copyright and Similar Rights
|
|
||||||
that applies to Your use of the Licensed Material.
|
|
||||||
|
|
||||||
g. License Elements means the license attributes listed in the name
|
|
||||||
of a Creative Commons Public License. The License Elements of this
|
|
||||||
Public License are Attribution, NonCommercial, and ShareAlike.
|
|
||||||
|
|
||||||
h. Licensed Material means the artistic or literary work, database,
|
|
||||||
or other material to which the Licensor applied this Public
|
|
||||||
License.
|
|
||||||
|
|
||||||
i. Licensed Rights means the rights granted to You subject to the
|
|
||||||
terms and conditions of this Public License, which are limited to
|
|
||||||
all Copyright and Similar Rights that apply to Your use of the
|
|
||||||
Licensed Material and that the Licensor has authority to license.
|
|
||||||
|
|
||||||
j. Licensor means the individual(s) or entity(ies) granting rights
|
|
||||||
under this Public License.
|
|
||||||
|
|
||||||
k. NonCommercial means not primarily intended for or directed towards
|
|
||||||
commercial advantage or monetary compensation. For purposes of
|
|
||||||
this Public License, the exchange of the Licensed Material for
|
|
||||||
other material subject to Copyright and Similar Rights by digital
|
|
||||||
file-sharing or similar means is NonCommercial provided there is
|
|
||||||
no payment of monetary compensation in connection with the
|
|
||||||
exchange.
|
|
||||||
|
|
||||||
l. Share means to provide material to the public by any means or
|
|
||||||
process that requires permission under the Licensed Rights, such
|
|
||||||
as reproduction, public display, public performance, distribution,
|
|
||||||
dissemination, communication, or importation, and to make material
|
|
||||||
available to the public including in ways that members of the
|
|
||||||
public may access the material from a place and at a time
|
|
||||||
individually chosen by them.
|
|
||||||
|
|
||||||
m. Sui Generis Database Rights means rights other than copyright
|
|
||||||
resulting from Directive 96/9/EC of the European Parliament and of
|
|
||||||
the Council of 11 March 1996 on the legal protection of databases,
|
|
||||||
as amended and/or succeeded, as well as other essentially
|
|
||||||
equivalent rights anywhere in the world.
|
|
||||||
|
|
||||||
n. You means the individual or entity exercising the Licensed Rights
|
|
||||||
under this Public License. Your has a corresponding meaning.
|
|
||||||
|
|
||||||
|
|
||||||
Section 2 -- Scope.
|
|
||||||
|
|
||||||
a. License grant.
|
|
||||||
|
|
||||||
1. Subject to the terms and conditions of this Public License,
|
|
||||||
the Licensor hereby grants You a worldwide, royalty-free,
|
|
||||||
non-sublicensable, non-exclusive, irrevocable license to
|
|
||||||
exercise the Licensed Rights in the Licensed Material to:
|
|
||||||
|
|
||||||
a. reproduce and Share the Licensed Material, in whole or
|
|
||||||
in part, for NonCommercial purposes only; and
|
|
||||||
|
|
||||||
b. produce, reproduce, and Share Adapted Material for
|
|
||||||
NonCommercial purposes only.
|
|
||||||
|
|
||||||
2. Exceptions and Limitations. For the avoidance of doubt, where
|
|
||||||
Exceptions and Limitations apply to Your use, this Public
|
|
||||||
License does not apply, and You do not need to comply with
|
|
||||||
its terms and conditions.
|
|
||||||
|
|
||||||
3. Term. The term of this Public License is specified in Section
|
|
||||||
6(a).
|
|
||||||
|
|
||||||
4. Media and formats; technical modifications allowed. The
|
|
||||||
Licensor authorizes You to exercise the Licensed Rights in
|
|
||||||
all media and formats whether now known or hereafter created,
|
|
||||||
and to make technical modifications necessary to do so. The
|
|
||||||
Licensor waives and/or agrees not to assert any right or
|
|
||||||
authority to forbid You from making technical modifications
|
|
||||||
necessary to exercise the Licensed Rights, including
|
|
||||||
technical modifications necessary to circumvent Effective
|
|
||||||
Technological Measures. For purposes of this Public License,
|
|
||||||
simply making modifications authorized by this Section 2(a)
|
|
||||||
(4) never produces Adapted Material.
|
|
||||||
|
|
||||||
5. Downstream recipients.
|
|
||||||
|
|
||||||
a. Offer from the Licensor -- Licensed Material. Every
|
|
||||||
recipient of the Licensed Material automatically
|
|
||||||
receives an offer from the Licensor to exercise the
|
|
||||||
Licensed Rights under the terms and conditions of this
|
|
||||||
Public License.
|
|
||||||
|
|
||||||
b. Additional offer from the Licensor -- Adapted Material.
|
|
||||||
Every recipient of Adapted Material from You
|
|
||||||
automatically receives an offer from the Licensor to
|
|
||||||
exercise the Licensed Rights in the Adapted Material
|
|
||||||
under the conditions of the Adapter's License You apply.
|
|
||||||
|
|
||||||
c. No downstream restrictions. You may not offer or impose
|
|
||||||
any additional or different terms or conditions on, or
|
|
||||||
apply any Effective Technological Measures to, the
|
|
||||||
Licensed Material if doing so restricts exercise of the
|
|
||||||
Licensed Rights by any recipient of the Licensed
|
|
||||||
Material.
|
|
||||||
|
|
||||||
6. No endorsement. Nothing in this Public License constitutes or
|
|
||||||
may be construed as permission to assert or imply that You
|
|
||||||
are, or that Your use of the Licensed Material is, connected
|
|
||||||
with, or sponsored, endorsed, or granted official status by,
|
|
||||||
the Licensor or others designated to receive attribution as
|
|
||||||
provided in Section 3(a)(1)(A)(i).
|
|
||||||
|
|
||||||
b. Other rights.
|
|
||||||
|
|
||||||
1. Moral rights, such as the right of integrity, are not
|
|
||||||
licensed under this Public License, nor are publicity,
|
|
||||||
privacy, and/or other similar personality rights; however, to
|
|
||||||
the extent possible, the Licensor waives and/or agrees not to
|
|
||||||
assert any such rights held by the Licensor to the limited
|
|
||||||
extent necessary to allow You to exercise the Licensed
|
|
||||||
Rights, but not otherwise.
|
|
||||||
|
|
||||||
2. Patent and trademark rights are not licensed under this
|
|
||||||
Public License.
|
|
||||||
|
|
||||||
3. To the extent possible, the Licensor waives any right to
|
|
||||||
collect royalties from You for the exercise of the Licensed
|
|
||||||
Rights, whether directly or through a collecting society
|
|
||||||
under any voluntary or waivable statutory or compulsory
|
|
||||||
licensing scheme. In all other cases the Licensor expressly
|
|
||||||
reserves any right to collect such royalties, including when
|
|
||||||
the Licensed Material is used other than for NonCommercial
|
|
||||||
purposes.
|
|
||||||
|
|
||||||
|
|
||||||
Section 3 -- License Conditions.
|
|
||||||
|
|
||||||
Your exercise of the Licensed Rights is expressly made subject to the
|
|
||||||
following conditions.
|
|
||||||
|
|
||||||
a. Attribution.
|
|
||||||
|
|
||||||
1. If You Share the Licensed Material (including in modified
|
|
||||||
form), You must:
|
|
||||||
|
|
||||||
a. retain the following if it is supplied by the Licensor
|
|
||||||
with the Licensed Material:
|
|
||||||
|
|
||||||
i. identification of the creator(s) of the Licensed
|
|
||||||
Material and any others designated to receive
|
|
||||||
attribution, in any reasonable manner requested by
|
|
||||||
the Licensor (including by pseudonym if
|
|
||||||
designated);
|
|
||||||
|
|
||||||
ii. a copyright notice;
|
|
||||||
|
|
||||||
iii. a notice that refers to this Public License;
|
|
||||||
|
|
||||||
iv. a notice that refers to the disclaimer of
|
|
||||||
warranties;
|
|
||||||
|
|
||||||
v. a URI or hyperlink to the Licensed Material to the
|
|
||||||
extent reasonably practicable;
|
|
||||||
|
|
||||||
b. indicate if You modified the Licensed Material and
|
|
||||||
retain an indication of any previous modifications; and
|
|
||||||
|
|
||||||
c. indicate the Licensed Material is licensed under this
|
|
||||||
Public License, and include the text of, or the URI or
|
|
||||||
hyperlink to, this Public License.
|
|
||||||
|
|
||||||
2. You may satisfy the conditions in Section 3(a)(1) in any
|
|
||||||
reasonable manner based on the medium, means, and context in
|
|
||||||
which You Share the Licensed Material. For example, it may be
|
|
||||||
reasonable to satisfy the conditions by providing a URI or
|
|
||||||
hyperlink to a resource that includes the required
|
|
||||||
information.
|
|
||||||
3. If requested by the Licensor, You must remove any of the
|
|
||||||
information required by Section 3(a)(1)(A) to the extent
|
|
||||||
reasonably practicable.
|
|
||||||
|
|
||||||
b. ShareAlike.
|
|
||||||
|
|
||||||
In addition to the conditions in Section 3(a), if You Share
|
|
||||||
Adapted Material You produce, the following conditions also apply.
|
|
||||||
|
|
||||||
1. The Adapter's License You apply must be a Creative Commons
|
|
||||||
license with the same License Elements, this version or
|
|
||||||
later, or a BY-NC-SA Compatible License.
|
|
||||||
|
|
||||||
2. You must include the text of, or the URI or hyperlink to, the
|
|
||||||
Adapter's License You apply. You may satisfy this condition
|
|
||||||
in any reasonable manner based on the medium, means, and
|
|
||||||
context in which You Share Adapted Material.
|
|
||||||
|
|
||||||
3. You may not offer or impose any additional or different terms
|
|
||||||
or conditions on, or apply any Effective Technological
|
|
||||||
Measures to, Adapted Material that restrict exercise of the
|
|
||||||
rights granted under the Adapter's License You apply.
|
|
||||||
|
|
||||||
|
|
||||||
Section 4 -- Sui Generis Database Rights.
|
|
||||||
|
|
||||||
Where the Licensed Rights include Sui Generis Database Rights that
|
|
||||||
apply to Your use of the Licensed Material:
|
|
||||||
|
|
||||||
a. for the avoidance of doubt, Section 2(a)(1) grants You the right
|
|
||||||
to extract, reuse, reproduce, and Share all or a substantial
|
|
||||||
portion of the contents of the database for NonCommercial purposes
|
|
||||||
only;
|
|
||||||
|
|
||||||
b. if You include all or a substantial portion of the database
|
|
||||||
contents in a database in which You have Sui Generis Database
|
|
||||||
Rights, then the database in which You have Sui Generis Database
|
|
||||||
Rights (but not its individual contents) is Adapted Material,
|
|
||||||
including for purposes of Section 3(b); and
|
|
||||||
|
|
||||||
c. You must comply with the conditions in Section 3(a) if You Share
|
|
||||||
all or a substantial portion of the contents of the database.
|
|
||||||
|
|
||||||
For the avoidance of doubt, this Section 4 supplements and does not
|
|
||||||
replace Your obligations under this Public License where the Licensed
|
|
||||||
Rights include other Copyright and Similar Rights.
|
|
||||||
|
|
||||||
|
|
||||||
Section 5 -- Disclaimer of Warranties and Limitation of Liability.
|
|
||||||
|
|
||||||
a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
|
|
||||||
EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
|
|
||||||
AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
|
|
||||||
ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
|
|
||||||
IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
|
|
||||||
WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
|
|
||||||
PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
|
|
||||||
ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
|
|
||||||
KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
|
|
||||||
ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
|
|
||||||
|
|
||||||
b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
|
|
||||||
TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
|
|
||||||
NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
|
|
||||||
INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
|
|
||||||
COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
|
|
||||||
USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
|
|
||||||
ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
|
|
||||||
DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
|
|
||||||
IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
|
|
||||||
|
|
||||||
c. The disclaimer of warranties and limitation of liability provided
|
|
||||||
above shall be interpreted in a manner that, to the extent
|
|
||||||
possible, most closely approximates an absolute disclaimer and
|
|
||||||
waiver of all liability.
|
|
||||||
|
|
||||||
|
|
||||||
Section 6 -- Term and Termination.
|
|
||||||
|
|
||||||
a. This Public License applies for the term of the Copyright and
|
|
||||||
Similar Rights licensed here. However, if You fail to comply with
|
|
||||||
this Public License, then Your rights under this Public License
|
|
||||||
terminate automatically.
|
|
||||||
|
|
||||||
b. Where Your right to use the Licensed Material has terminated under
|
|
||||||
Section 6(a), it reinstates:
|
|
||||||
|
|
||||||
1. automatically as of the date the violation is cured, provided
|
|
||||||
it is cured within 30 days of Your discovery of the
|
|
||||||
violation; or
|
|
||||||
|
|
||||||
2. upon express reinstatement by the Licensor.
|
|
||||||
|
|
||||||
For the avoidance of doubt, this Section 6(b) does not affect any
|
|
||||||
right the Licensor may have to seek remedies for Your violations
|
|
||||||
of this Public License.
|
|
||||||
|
|
||||||
c. For the avoidance of doubt, the Licensor may also offer the
|
|
||||||
Licensed Material under separate terms or conditions or stop
|
|
||||||
distributing the Licensed Material at any time; however, doing so
|
|
||||||
will not terminate this Public License.
|
|
||||||
|
|
||||||
d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
|
|
||||||
License.
|
|
||||||
|
|
||||||
|
|
||||||
Section 7 -- Other Terms and Conditions.
|
|
||||||
|
|
||||||
a. The Licensor shall not be bound by any additional or different
|
|
||||||
terms or conditions communicated by You unless expressly agreed.
|
|
||||||
|
|
||||||
b. Any arrangements, understandings, or agreements regarding the
|
|
||||||
Licensed Material not stated herein are separate from and
|
|
||||||
independent of the terms and conditions of this Public License.
|
|
||||||
|
|
||||||
|
|
||||||
Section 8 -- Interpretation.
|
|
||||||
|
|
||||||
a. For the avoidance of doubt, this Public License does not, and
|
|
||||||
shall not be interpreted to, reduce, limit, restrict, or impose
|
|
||||||
conditions on any use of the Licensed Material that could lawfully
|
|
||||||
be made without permission under this Public License.
|
|
||||||
|
|
||||||
b. To the extent possible, if any provision of this Public License is
|
|
||||||
deemed unenforceable, it shall be automatically reformed to the
|
|
||||||
minimum extent necessary to make it enforceable. If the provision
|
|
||||||
cannot be reformed, it shall be severed from this Public License
|
|
||||||
without affecting the enforceability of the remaining terms and
|
|
||||||
conditions.
|
|
||||||
|
|
||||||
c. No term or condition of this Public License will be waived and no
|
|
||||||
failure to comply consented to unless expressly agreed to by the
|
|
||||||
Licensor.
|
|
||||||
|
|
||||||
d. Nothing in this Public License constitutes or may be interpreted
|
|
||||||
as a limitation upon, or waiver of, any privileges and immunities
|
|
||||||
that apply to the Licensor or You, including from the legal
|
|
||||||
processes of any jurisdiction or authority.
|
|
||||||
|
|
||||||
=======================================================================
|
|
||||||
|
|
||||||
Creative Commons is not a party to its public
|
|
||||||
licenses. Notwithstanding, Creative Commons may elect to apply one of
|
|
||||||
its public licenses to material it publishes and in those instances
|
|
||||||
will be considered the “Licensor.” The text of the Creative Commons
|
|
||||||
public licenses is dedicated to the public domain under the CC0 Public
|
|
||||||
Domain Dedication. Except for the limited purpose of indicating that
|
|
||||||
material is shared under a Creative Commons public license or as
|
|
||||||
otherwise permitted by the Creative Commons policies published at
|
|
||||||
creativecommons.org/policies, Creative Commons does not authorize the
|
|
||||||
use of the trademark "Creative Commons" or any other trademark or logo
|
|
||||||
of Creative Commons without its prior written consent including,
|
|
||||||
without limitation, in connection with any unauthorized modifications
|
|
||||||
to any of its public licenses or any other arrangements,
|
|
||||||
understandings, or agreements concerning use of licensed material. For
|
|
||||||
the avoidance of doubt, this paragraph does not form part of the
|
|
||||||
public licenses.
|
|
||||||
|
|
||||||
Creative Commons may be contacted at creativecommons.org.
|
|
||||||
@@ -1,38 +0,0 @@
|
|||||||
---
|
|
||||||
tags:
|
|
||||||
- robotics
|
|
||||||
---
|
|
||||||
|
|
||||||
# UnifoLM-WMA-0: A World-Model-Action (WMA) Framework under UnifoLM Family
|
|
||||||
<p style="font-size: 1.2em;">
|
|
||||||
<a href="https://unigen-x.github.io/unifolm-world-model-action.github.io"><strong>Project Page</strong></a> |
|
|
||||||
<a href="https://github.com/unitreerobotics/unifolm-world-model-action"><strong>Code</strong></a> |
|
|
||||||
<a href="https://huggingface.co/unitreerobotics/datasets"><strong>Dataset</strong></a>
|
|
||||||
</p>
|
|
||||||
<div align="center">
|
|
||||||
<div align="justify">
|
|
||||||
<b>UnifoLM-WMA-0</b> is Unitree‘s first open-source world-model–action architecture spanning multiple types of robotic embodiments, designed specifically for general-purpose robot learning. Its core component is a world-model capable of understanding the physical interactions between robots and the environments. This world-model provides two key functions: (a) <b>Simulation Engine</b> – operates as an interactive simulator to generate synthetic data for robot learning; (b) <b>Policy Enhancement</b> – connects with an action head and, by predicting future interaction processes with the world-model, further optimizes decision-making performance.
|
|
||||||
</div>
|
|
||||||
</div>
|
|
||||||
|
|
||||||
## 🦾 Real Robot Deployment
|
|
||||||
| <img src="assets/real_z1_stackbox.gif" style="border:none;box-shadow:none;margin:0;padding:0;" /> | <img src="assets/real_dual_stackbox.gif" style="border:none;box-shadow:none;margin:0;padding:0;" /> |
|
|
||||||
|:---:|:---:|
|
|
||||||
| <img src="assets/real_cleanup_pencils.gif" style="border:none;box-shadow:none;margin:0;padding:0;" /> | <img src="assets/real_g1_pack_camera.gif" style="border:none;box-shadow:none;margin:0;padding:0;" /> |
|
|
||||||
|
|
||||||
**Note: the top-right window shows the world model’s prediction of future environmental changes.**
|
|
||||||
|
|
||||||
## License
|
|
||||||
The model is released under the CC BY-NC-SA 4.0 license as found in the [LICENSE](https://huggingface.co/unitreerobotics/UnifoLM-WMA-0/blob/main/LICENSE). You are responsible for ensuring that your use of Unitree AI Models complies with all applicable laws.
|
|
||||||
|
|
||||||
## Model Architecture
|
|
||||||

|
|
||||||
|
|
||||||
## Citation
|
|
||||||
```
|
|
||||||
@misc{unifolm-wma-0,
|
|
||||||
author = {Unitree},
|
|
||||||
title = {UnifoLM-WMA-0: A World-Model-Action (WMA) Framework under UnifoLM Family},
|
|
||||||
year = {2025},
|
|
||||||
}
|
|
||||||
```
|
|
||||||
Binary file not shown.
|
Before Width: | Height: | Size: 22 MiB |
Binary file not shown.
|
Before Width: | Height: | Size: 28 MiB |
Binary file not shown.
|
Before Width: | Height: | Size: 25 MiB |
Binary file not shown.
|
Before Width: | Height: | Size: 15 MiB |
Binary file not shown.
|
Before Width: | Height: | Size: 4.3 MiB |
@@ -19,13 +19,13 @@ dependencies = [
|
|||||||
"pytorch-lightning==1.9.3",
|
"pytorch-lightning==1.9.3",
|
||||||
"pyyaml==6.0",
|
"pyyaml==6.0",
|
||||||
"setuptools==65.6.3",
|
"setuptools==65.6.3",
|
||||||
#"torch==2.3.1",
|
"torch==2.3.1",
|
||||||
#"torchvision==0.18.1",
|
"torchvision==0.18.1",
|
||||||
"tqdm==4.66.5",
|
"tqdm==4.66.5",
|
||||||
"transformers==4.40.1",
|
"transformers==4.40.1",
|
||||||
"moviepy==1.0.3",
|
"moviepy==1.0.3",
|
||||||
"av==12.3.0",
|
"av==12.3.0",
|
||||||
#"xformers==0.0.27",
|
"xformers==0.0.27",
|
||||||
"gradio==4.39.0",
|
"gradio==4.39.0",
|
||||||
"timm==0.9.10",
|
"timm==0.9.10",
|
||||||
"scikit-learn==1.5.1",
|
"scikit-learn==1.5.1",
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
@@ -1,37 +0,0 @@
|
|||||||
2026-02-11 17:34:29.188470: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
|
||||||
2026-02-11 17:34:29.238296: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
|
||||||
2026-02-11 17:34:29.238342: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
|
||||||
2026-02-11 17:34:29.239649: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
|
||||||
2026-02-11 17:34:29.247152: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
|
||||||
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
|
||||||
2026-02-11 17:34:30.172640: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
|
||||||
Global seed set to 123
|
|
||||||
>>> Loading prepared model from ckpts/unifolm_wma_dual.ckpt.prepared.pt ...
|
|
||||||
>>> Prepared model loaded.
|
|
||||||
>>> Diffusion backbone (model.model) converted to FP16.
|
|
||||||
>>> Projectors (image_proj_model, state_projector, action_projector) converted to FP16.
|
|
||||||
>>> Encoders (cond_stage_model, embedder) converted to FP16.
|
|
||||||
INFO:root:***** Configing Data *****
|
|
||||||
>>> unitree_z1_stackbox: 1 data samples loaded.
|
|
||||||
>>> unitree_z1_stackbox: data stats loaded.
|
|
||||||
>>> unitree_z1_stackbox: normalizer initiated.
|
|
||||||
>>> unitree_z1_dual_arm_stackbox: 1 data samples loaded.
|
|
||||||
>>> unitree_z1_dual_arm_stackbox: data stats loaded.
|
|
||||||
>>> unitree_z1_dual_arm_stackbox: normalizer initiated.
|
|
||||||
>>> unitree_z1_dual_arm_stackbox_v2: 1 data samples loaded.
|
|
||||||
>>> unitree_z1_dual_arm_stackbox_v2: data stats loaded.
|
|
||||||
>>> unitree_z1_dual_arm_stackbox_v2: normalizer initiated.
|
|
||||||
>>> unitree_z1_dual_arm_cleanup_pencils: 1 data samples loaded.
|
|
||||||
>>> unitree_z1_dual_arm_cleanup_pencils: data stats loaded.
|
|
||||||
>>> unitree_z1_dual_arm_cleanup_pencils: normalizer initiated.
|
|
||||||
>>> unitree_g1_pack_camera: 1 data samples loaded.
|
|
||||||
>>> unitree_g1_pack_camera: data stats loaded.
|
|
||||||
>>> unitree_g1_pack_camera: normalizer initiated.
|
|
||||||
>>> Dataset is successfully loaded ...
|
|
||||||
✓ KV fused: 66 attention layers
|
|
||||||
>>> Generate 16 frames under each generation ...
|
|
||||||
DEBUG:h5py._conv:Creating converter from 3 to 5
|
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
|
||||||
|
|
||||||
File diff suppressed because it is too large
Load Diff
0
run_all_psnr.sh
Executable file → Normal file
0
run_all_psnr.sh
Executable file → Normal file
@@ -16,9 +16,6 @@ from collections import OrderedDict
|
|||||||
from unifolm_wma.models.samplers.ddim import DDIMSampler
|
from unifolm_wma.models.samplers.ddim import DDIMSampler
|
||||||
from unifolm_wma.utils.utils import instantiate_from_config
|
from unifolm_wma.utils.utils import instantiate_from_config
|
||||||
|
|
||||||
torch.backends.cuda.matmul.allow_tf32 = True
|
|
||||||
torch.backends.cudnn.allow_tf32 = True
|
|
||||||
|
|
||||||
|
|
||||||
def get_filelist(data_dir: str, postfixes: list[str]) -> list[str]:
|
def get_filelist(data_dir: str, postfixes: list[str]) -> list[str]:
|
||||||
"""
|
"""
|
||||||
|
|||||||
@@ -19,9 +19,6 @@ from fastapi.responses import JSONResponse
|
|||||||
from typing import Any, Dict, Optional, Tuple, List
|
from typing import Any, Dict, Optional, Tuple, List
|
||||||
from datetime import datetime
|
from datetime import datetime
|
||||||
|
|
||||||
torch.backends.cuda.matmul.allow_tf32 = True
|
|
||||||
torch.backends.cudnn.allow_tf32 = True
|
|
||||||
|
|
||||||
from unifolm_wma.utils.utils import instantiate_from_config
|
from unifolm_wma.utils.utils import instantiate_from_config
|
||||||
from unifolm_wma.models.samplers.ddim import DDIMSampler
|
from unifolm_wma.models.samplers.ddim import DDIMSampler
|
||||||
|
|
||||||
|
|||||||
@@ -9,8 +9,6 @@ import logging
|
|||||||
import einops
|
import einops
|
||||||
import warnings
|
import warnings
|
||||||
import imageio
|
import imageio
|
||||||
import atexit
|
|
||||||
from concurrent.futures import ThreadPoolExecutor
|
|
||||||
|
|
||||||
from pytorch_lightning import seed_everything
|
from pytorch_lightning import seed_everything
|
||||||
from omegaconf import OmegaConf
|
from omegaconf import OmegaConf
|
||||||
@@ -18,12 +16,8 @@ from tqdm import tqdm
|
|||||||
from einops import rearrange, repeat
|
from einops import rearrange, repeat
|
||||||
from collections import OrderedDict
|
from collections import OrderedDict
|
||||||
from torch import nn
|
from torch import nn
|
||||||
from eval_utils import populate_queues
|
from eval_utils import populate_queues, log_to_tensorboard
|
||||||
from collections import deque
|
from collections import deque
|
||||||
from typing import Optional, List, Any
|
|
||||||
|
|
||||||
torch.backends.cuda.matmul.allow_tf32 = True
|
|
||||||
torch.backends.cudnn.allow_tf32 = True
|
|
||||||
from torch import Tensor
|
from torch import Tensor
|
||||||
from torch.utils.tensorboard import SummaryWriter
|
from torch.utils.tensorboard import SummaryWriter
|
||||||
from PIL import Image
|
from PIL import Image
|
||||||
@@ -156,81 +150,6 @@ def save_results(video: Tensor, filename: str, fps: int = 8) -> None:
|
|||||||
options={'crf': '10'})
|
options={'crf': '10'})
|
||||||
|
|
||||||
|
|
||||||
# ========== Async I/O ==========
|
|
||||||
_io_executor: Optional[ThreadPoolExecutor] = None
|
|
||||||
_io_futures: List[Any] = []
|
|
||||||
|
|
||||||
|
|
||||||
def _get_io_executor() -> ThreadPoolExecutor:
|
|
||||||
global _io_executor
|
|
||||||
if _io_executor is None:
|
|
||||||
_io_executor = ThreadPoolExecutor(max_workers=2)
|
|
||||||
return _io_executor
|
|
||||||
|
|
||||||
|
|
||||||
def _flush_io():
|
|
||||||
"""Wait for all pending async I/O to finish."""
|
|
||||||
global _io_futures
|
|
||||||
for fut in _io_futures:
|
|
||||||
try:
|
|
||||||
fut.result()
|
|
||||||
except Exception as e:
|
|
||||||
print(f">>> [async I/O] error: {e}")
|
|
||||||
_io_futures.clear()
|
|
||||||
|
|
||||||
|
|
||||||
atexit.register(_flush_io)
|
|
||||||
|
|
||||||
|
|
||||||
def _save_results_sync(video_cpu: Tensor, filename: str, fps: int) -> None:
|
|
||||||
"""Synchronous save on CPU tensor (runs in background thread)."""
|
|
||||||
video = torch.clamp(video_cpu.float(), -1., 1.)
|
|
||||||
n = video.shape[0]
|
|
||||||
video = video.permute(2, 0, 1, 3, 4)
|
|
||||||
frame_grids = [
|
|
||||||
torchvision.utils.make_grid(framesheet, nrow=int(n), padding=0)
|
|
||||||
for framesheet in video
|
|
||||||
]
|
|
||||||
grid = torch.stack(frame_grids, dim=0)
|
|
||||||
grid = (grid + 1.0) / 2.0
|
|
||||||
grid = (grid * 255).to(torch.uint8).permute(0, 2, 3, 1)
|
|
||||||
torchvision.io.write_video(filename,
|
|
||||||
grid,
|
|
||||||
fps=fps,
|
|
||||||
video_codec='h264',
|
|
||||||
options={'crf': '10'})
|
|
||||||
|
|
||||||
|
|
||||||
def save_results_async(video: Tensor, filename: str, fps: int = 8) -> None:
|
|
||||||
"""Submit video saving to background thread pool."""
|
|
||||||
video_cpu = video.detach().cpu()
|
|
||||||
fut = _get_io_executor().submit(_save_results_sync, video_cpu, filename, fps)
|
|
||||||
_io_futures.append(fut)
|
|
||||||
|
|
||||||
|
|
||||||
def _log_to_tb_sync(writer, video_cpu: Tensor, tag: str, fps: int) -> None:
|
|
||||||
"""Synchronous TensorBoard log on CPU tensor (runs in background thread)."""
|
|
||||||
if video_cpu.dim() == 5:
|
|
||||||
n = video_cpu.shape[0]
|
|
||||||
video = video_cpu.permute(2, 0, 1, 3, 4)
|
|
||||||
frame_grids = [
|
|
||||||
torchvision.utils.make_grid(framesheet, nrow=int(n), padding=0)
|
|
||||||
for framesheet in video
|
|
||||||
]
|
|
||||||
grid = torch.stack(frame_grids, dim=0)
|
|
||||||
grid = (grid + 1.0) / 2.0
|
|
||||||
grid = grid.unsqueeze(dim=0)
|
|
||||||
writer.add_video(tag, grid, fps=fps)
|
|
||||||
|
|
||||||
|
|
||||||
def log_to_tensorboard_async(writer, data: Tensor, tag: str, fps: int = 10) -> None:
|
|
||||||
"""Submit TensorBoard logging to background thread pool."""
|
|
||||||
if isinstance(data, torch.Tensor) and data.dim() == 5:
|
|
||||||
data_cpu = data.detach().cpu()
|
|
||||||
fut = _get_io_executor().submit(_log_to_tb_sync, writer, data_cpu, tag, fps)
|
|
||||||
_io_futures.append(fut)
|
|
||||||
|
|
||||||
|
|
||||||
def get_init_frame_path(data_dir: str, sample: dict) -> str:
|
def get_init_frame_path(data_dir: str, sample: dict) -> str:
|
||||||
"""Construct the init_frame path from directory and sample metadata.
|
"""Construct the init_frame path from directory and sample metadata.
|
||||||
|
|
||||||
@@ -408,8 +327,7 @@ def image_guided_synthesis_sim_mode(
|
|||||||
timestep_spacing: str = 'uniform',
|
timestep_spacing: str = 'uniform',
|
||||||
guidance_rescale: float = 0.0,
|
guidance_rescale: float = 0.0,
|
||||||
sim_mode: bool = True,
|
sim_mode: bool = True,
|
||||||
decode_video: bool = True,
|
**kwargs) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
|
||||||
**kwargs) -> tuple[torch.Tensor | None, torch.Tensor, torch.Tensor]:
|
|
||||||
"""
|
"""
|
||||||
Performs image-guided video generation in a simulation-style mode with optional multimodal guidance (image, state, action, text).
|
Performs image-guided video generation in a simulation-style mode with optional multimodal guidance (image, state, action, text).
|
||||||
|
|
||||||
@@ -432,13 +350,10 @@ def image_guided_synthesis_sim_mode(
|
|||||||
timestep_spacing (str): Timestep sampling method in DDIM sampler. Typically "uniform" or "linspace".
|
timestep_spacing (str): Timestep sampling method in DDIM sampler. Typically "uniform" or "linspace".
|
||||||
guidance_rescale (float): Guidance rescaling factor to mitigate overexposure from classifier-free guidance.
|
guidance_rescale (float): Guidance rescaling factor to mitigate overexposure from classifier-free guidance.
|
||||||
sim_mode (bool): Whether to perform world-model interaction or decision-making using the world-model.
|
sim_mode (bool): Whether to perform world-model interaction or decision-making using the world-model.
|
||||||
decode_video (bool): Whether to decode latent samples to pixel-space video.
|
|
||||||
Set to False to skip VAE decode for speed when only actions/states are needed.
|
|
||||||
**kwargs: Additional arguments passed to the DDIM sampler.
|
**kwargs: Additional arguments passed to the DDIM sampler.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
batch_variants (torch.Tensor | None): Predicted pixel-space video frames [B, C, T, H, W],
|
batch_variants (torch.Tensor): Predicted pixel-space video frames [B, C, T, H, W].
|
||||||
or None when decode_video=False.
|
|
||||||
actions (torch.Tensor): Predicted action sequences [B, T, D] from diffusion decoding.
|
actions (torch.Tensor): Predicted action sequences [B, T, D] from diffusion decoding.
|
||||||
states (torch.Tensor): Predicted state sequences [B, T, D] from diffusion decoding.
|
states (torch.Tensor): Predicted state sequences [B, T, D] from diffusion decoding.
|
||||||
"""
|
"""
|
||||||
@@ -450,7 +365,6 @@ def image_guided_synthesis_sim_mode(
|
|||||||
|
|
||||||
img = observation['observation.images.top'].permute(0, 2, 1, 3, 4)
|
img = observation['observation.images.top'].permute(0, 2, 1, 3, 4)
|
||||||
cond_img = rearrange(img, 'b o c h w -> (b o) c h w')[-1:]
|
cond_img = rearrange(img, 'b o c h w -> (b o) c h w')[-1:]
|
||||||
with torch.cuda.amp.autocast(dtype=torch.float16):
|
|
||||||
cond_img_emb = model.embedder(cond_img)
|
cond_img_emb = model.embedder(cond_img)
|
||||||
cond_img_emb = model.image_proj_model(cond_img_emb)
|
cond_img_emb = model.image_proj_model(cond_img_emb)
|
||||||
|
|
||||||
@@ -466,7 +380,6 @@ def image_guided_synthesis_sim_mode(
|
|||||||
prompts = [""] * batch_size
|
prompts = [""] * batch_size
|
||||||
cond_ins_emb = model.get_learned_conditioning(prompts)
|
cond_ins_emb = model.get_learned_conditioning(prompts)
|
||||||
|
|
||||||
with torch.cuda.amp.autocast(dtype=torch.float16):
|
|
||||||
cond_state_emb = model.state_projector(observation['observation.state'])
|
cond_state_emb = model.state_projector(observation['observation.state'])
|
||||||
cond_state_emb = cond_state_emb + model.agent_state_pos_emb
|
cond_state_emb = cond_state_emb + model.agent_state_pos_emb
|
||||||
|
|
||||||
@@ -493,8 +406,6 @@ def image_guided_synthesis_sim_mode(
|
|||||||
kwargs.update({"unconditional_conditioning_img_nonetext": None})
|
kwargs.update({"unconditional_conditioning_img_nonetext": None})
|
||||||
cond_mask = None
|
cond_mask = None
|
||||||
cond_z0 = None
|
cond_z0 = None
|
||||||
batch_variants = None
|
|
||||||
samples = None
|
|
||||||
if ddim_sampler is not None:
|
if ddim_sampler is not None:
|
||||||
samples, actions, states, intermedia = ddim_sampler.sample(
|
samples, actions, states, intermedia = ddim_sampler.sample(
|
||||||
S=ddim_steps,
|
S=ddim_steps,
|
||||||
@@ -513,12 +424,11 @@ def image_guided_synthesis_sim_mode(
|
|||||||
guidance_rescale=guidance_rescale,
|
guidance_rescale=guidance_rescale,
|
||||||
**kwargs)
|
**kwargs)
|
||||||
|
|
||||||
if decode_video:
|
|
||||||
# Reconstruct from latent to pixel space
|
# Reconstruct from latent to pixel space
|
||||||
batch_images = model.decode_first_stage(samples)
|
batch_images = model.decode_first_stage(samples)
|
||||||
batch_variants = batch_images
|
batch_variants = batch_images
|
||||||
|
|
||||||
return batch_variants, actions, states, samples
|
return batch_variants, actions, states
|
||||||
|
|
||||||
|
|
||||||
def run_inference(args: argparse.Namespace, gpu_num: int, gpu_no: int) -> None:
|
def run_inference(args: argparse.Namespace, gpu_num: int, gpu_no: int) -> None:
|
||||||
@@ -543,67 +453,26 @@ def run_inference(args: argparse.Namespace, gpu_num: int, gpu_no: int) -> None:
|
|||||||
csv_path = os.path.join(args.prompt_dir, f"{args.dataset}.csv")
|
csv_path = os.path.join(args.prompt_dir, f"{args.dataset}.csv")
|
||||||
df = pd.read_csv(csv_path)
|
df = pd.read_csv(csv_path)
|
||||||
|
|
||||||
# Load config (always needed for data setup)
|
# Load config
|
||||||
config = OmegaConf.load(args.config)
|
config = OmegaConf.load(args.config)
|
||||||
|
|
||||||
prepared_path = args.ckpt_path + ".prepared.pt"
|
|
||||||
if os.path.exists(prepared_path):
|
|
||||||
# ---- Fast path: load the fully-prepared model ----
|
|
||||||
print(f">>> Loading prepared model from {prepared_path} ...")
|
|
||||||
model = torch.load(prepared_path,
|
|
||||||
map_location=f"cuda:{gpu_no}",
|
|
||||||
weights_only=False,
|
|
||||||
mmap=True)
|
|
||||||
model.eval()
|
|
||||||
print(f">>> Prepared model loaded.")
|
|
||||||
else:
|
|
||||||
# ---- Normal path: construct + load checkpoint ----
|
|
||||||
config['model']['params']['wma_config']['params'][
|
config['model']['params']['wma_config']['params'][
|
||||||
'use_checkpoint'] = False
|
'use_checkpoint'] = False
|
||||||
model = instantiate_from_config(config.model)
|
model = instantiate_from_config(config.model)
|
||||||
model.perframe_ae = args.perframe_ae
|
model.perframe_ae = args.perframe_ae
|
||||||
|
|
||||||
assert os.path.exists(args.ckpt_path), "Error: checkpoint Not Found!"
|
assert os.path.exists(args.ckpt_path), "Error: checkpoint Not Found!"
|
||||||
model = load_model_checkpoint(model, args.ckpt_path)
|
model = load_model_checkpoint(model, args.ckpt_path)
|
||||||
model.eval()
|
model.eval()
|
||||||
model = model.cuda(gpu_no)
|
|
||||||
print(f'>>> Load pre-trained model ...')
|
print(f'>>> Load pre-trained model ...')
|
||||||
|
|
||||||
# Save prepared model for fast loading next time
|
# Build unnomalizer
|
||||||
print(f">>> Saving prepared model to {prepared_path} ...")
|
|
||||||
torch.save(model, prepared_path)
|
|
||||||
print(f">>> Prepared model saved ({os.path.getsize(prepared_path) / 1024**3:.1f} GB).")
|
|
||||||
|
|
||||||
# ---- FP16: convert diffusion backbone + conditioning modules ----
|
|
||||||
model.model.to(torch.float16)
|
|
||||||
model.model.diffusion_model.dtype = torch.float16
|
|
||||||
print(">>> Diffusion backbone (model.model) converted to FP16.")
|
|
||||||
|
|
||||||
# Projectors / MLP → FP16
|
|
||||||
model.image_proj_model.half()
|
|
||||||
model.state_projector.half()
|
|
||||||
model.action_projector.half()
|
|
||||||
print(">>> Projectors (image_proj_model, state_projector, action_projector) converted to FP16.")
|
|
||||||
|
|
||||||
# Text/image encoders → FP16
|
|
||||||
model.cond_stage_model.half()
|
|
||||||
model.embedder.half()
|
|
||||||
print(">>> Encoders (cond_stage_model, embedder) converted to FP16.")
|
|
||||||
|
|
||||||
# Build normalizer (always needed, independent of model loading path)
|
|
||||||
logging.info("***** Configing Data *****")
|
logging.info("***** Configing Data *****")
|
||||||
data = instantiate_from_config(config.data)
|
data = instantiate_from_config(config.data)
|
||||||
data.setup()
|
data.setup()
|
||||||
print(">>> Dataset is successfully loaded ...")
|
print(">>> Dataset is successfully loaded ...")
|
||||||
|
|
||||||
|
model = model.cuda(gpu_no)
|
||||||
device = get_device_from_parameters(model)
|
device = get_device_from_parameters(model)
|
||||||
|
|
||||||
# Fuse KV projections in attention layers (to_k + to_v → to_kv)
|
|
||||||
from unifolm_wma.modules.attention import CrossAttention
|
|
||||||
kv_count = sum(1 for m in model.modules()
|
|
||||||
if isinstance(m, CrossAttention) and m.fuse_kv())
|
|
||||||
print(f" ✓ KV fused: {kv_count} attention layers")
|
|
||||||
|
|
||||||
# Run over data
|
# Run over data
|
||||||
assert (args.height % 16 == 0) and (
|
assert (args.height % 16 == 0) and (
|
||||||
args.width % 16
|
args.width % 16
|
||||||
@@ -649,7 +518,7 @@ def run_inference(args: argparse.Namespace, gpu_num: int, gpu_no: int) -> None:
|
|||||||
sample_save_dir = f'{video_save_dir}/wm/{fs}'
|
sample_save_dir = f'{video_save_dir}/wm/{fs}'
|
||||||
os.makedirs(sample_save_dir, exist_ok=True)
|
os.makedirs(sample_save_dir, exist_ok=True)
|
||||||
# For collecting interaction videos
|
# For collecting interaction videos
|
||||||
wm_latent = []
|
wm_video = []
|
||||||
# Initialize observation queues
|
# Initialize observation queues
|
||||||
cond_obs_queues = {
|
cond_obs_queues = {
|
||||||
"observation.images.top":
|
"observation.images.top":
|
||||||
@@ -705,7 +574,7 @@ def run_inference(args: argparse.Namespace, gpu_num: int, gpu_no: int) -> None:
|
|||||||
|
|
||||||
# Use world-model in policy to generate action
|
# Use world-model in policy to generate action
|
||||||
print(f'>>> Step {itr}: generating actions ...')
|
print(f'>>> Step {itr}: generating actions ...')
|
||||||
pred_videos_0, pred_actions, _, _ = image_guided_synthesis_sim_mode(
|
pred_videos_0, pred_actions, _ = image_guided_synthesis_sim_mode(
|
||||||
model,
|
model,
|
||||||
sample['instruction'],
|
sample['instruction'],
|
||||||
observation,
|
observation,
|
||||||
@@ -718,8 +587,7 @@ def run_inference(args: argparse.Namespace, gpu_num: int, gpu_no: int) -> None:
|
|||||||
fs=model_input_fs,
|
fs=model_input_fs,
|
||||||
timestep_spacing=args.timestep_spacing,
|
timestep_spacing=args.timestep_spacing,
|
||||||
guidance_rescale=args.guidance_rescale,
|
guidance_rescale=args.guidance_rescale,
|
||||||
sim_mode=False,
|
sim_mode=False)
|
||||||
decode_video=not args.fast_policy_no_decode)
|
|
||||||
|
|
||||||
# Update future actions in the observation queues
|
# Update future actions in the observation queues
|
||||||
for idx in range(len(pred_actions[0])):
|
for idx in range(len(pred_actions[0])):
|
||||||
@@ -747,7 +615,7 @@ def run_inference(args: argparse.Namespace, gpu_num: int, gpu_no: int) -> None:
|
|||||||
|
|
||||||
# Interaction with the world-model
|
# Interaction with the world-model
|
||||||
print(f'>>> Step {itr}: interacting with world model ...')
|
print(f'>>> Step {itr}: interacting with world model ...')
|
||||||
pred_videos_1, _, pred_states, wm_samples = image_guided_synthesis_sim_mode(
|
pred_videos_1, _, pred_states = image_guided_synthesis_sim_mode(
|
||||||
model,
|
model,
|
||||||
"",
|
"",
|
||||||
observation,
|
observation,
|
||||||
@@ -760,16 +628,12 @@ def run_inference(args: argparse.Namespace, gpu_num: int, gpu_no: int) -> None:
|
|||||||
fs=model_input_fs,
|
fs=model_input_fs,
|
||||||
text_input=False,
|
text_input=False,
|
||||||
timestep_spacing=args.timestep_spacing,
|
timestep_spacing=args.timestep_spacing,
|
||||||
guidance_rescale=args.guidance_rescale,
|
guidance_rescale=args.guidance_rescale)
|
||||||
decode_video=False)
|
|
||||||
|
|
||||||
# Decode only the last frame for CLIP embedding in next iteration
|
|
||||||
last_frame_pixel = model.decode_first_stage(wm_samples[:, :, -1:, :, :])
|
|
||||||
|
|
||||||
for idx in range(args.exe_steps):
|
for idx in range(args.exe_steps):
|
||||||
observation = {
|
observation = {
|
||||||
'observation.images.top':
|
'observation.images.top':
|
||||||
last_frame_pixel[0, :, 0:1].permute(1, 0, 2, 3),
|
pred_videos_1[0][:, idx:idx + 1].permute(1, 0, 2, 3),
|
||||||
'observation.state':
|
'observation.state':
|
||||||
torch.zeros_like(pred_states[0][idx:idx + 1]) if
|
torch.zeros_like(pred_states[0][idx:idx + 1]) if
|
||||||
args.zero_pred_state else pred_states[0][idx:idx + 1],
|
args.zero_pred_state else pred_states[0][idx:idx + 1],
|
||||||
@@ -780,31 +644,42 @@ def run_inference(args: argparse.Namespace, gpu_num: int, gpu_no: int) -> None:
|
|||||||
cond_obs_queues = populate_queues(cond_obs_queues,
|
cond_obs_queues = populate_queues(cond_obs_queues,
|
||||||
observation)
|
observation)
|
||||||
|
|
||||||
# Save the imagen videos for decision-making (async)
|
# Save the imagen videos for decision-making
|
||||||
if pred_videos_0 is not None:
|
|
||||||
sample_tag = f"{args.dataset}-vid{sample['videoid']}-dm-fs-{fs}/itr-{itr}"
|
sample_tag = f"{args.dataset}-vid{sample['videoid']}-dm-fs-{fs}/itr-{itr}"
|
||||||
log_to_tensorboard_async(writer,
|
log_to_tensorboard(writer,
|
||||||
pred_videos_0,
|
pred_videos_0,
|
||||||
sample_tag,
|
sample_tag,
|
||||||
fps=args.save_fps)
|
fps=args.save_fps)
|
||||||
|
# Save videos environment changes via world-model interaction
|
||||||
|
sample_tag = f"{args.dataset}-vid{sample['videoid']}-wd-fs-{fs}/itr-{itr}"
|
||||||
|
log_to_tensorboard(writer,
|
||||||
|
pred_videos_1,
|
||||||
|
sample_tag,
|
||||||
|
fps=args.save_fps)
|
||||||
|
|
||||||
|
# Save the imagen videos for decision-making
|
||||||
|
sample_video_file = f'{video_save_dir}/dm/{fs}/itr-{itr}.mp4'
|
||||||
|
save_results(pred_videos_0.cpu(),
|
||||||
|
sample_video_file,
|
||||||
|
fps=args.save_fps)
|
||||||
|
# Save videos environment changes via world-model interaction
|
||||||
|
sample_video_file = f'{video_save_dir}/wm/{fs}/itr-{itr}.mp4'
|
||||||
|
save_results(pred_videos_1.cpu(),
|
||||||
|
sample_video_file,
|
||||||
|
fps=args.save_fps)
|
||||||
|
|
||||||
print('>' * 24)
|
print('>' * 24)
|
||||||
# Store raw latent for deferred decode
|
# Collect the result of world-model interactions
|
||||||
wm_latent.append(wm_samples[:, :, :args.exe_steps].cpu())
|
wm_video.append(pred_videos_1[:, :, :args.exe_steps].cpu())
|
||||||
|
|
||||||
# Deferred decode: batch decode all stored latents
|
full_video = torch.cat(wm_video, dim=2)
|
||||||
full_latent = torch.cat(wm_latent, dim=2).to(device)
|
|
||||||
full_video = model.decode_first_stage(full_latent).cpu()
|
|
||||||
sample_tag = f"{args.dataset}-vid{sample['videoid']}-wd-fs-{fs}/full"
|
sample_tag = f"{args.dataset}-vid{sample['videoid']}-wd-fs-{fs}/full"
|
||||||
log_to_tensorboard_async(writer,
|
log_to_tensorboard(writer,
|
||||||
full_video,
|
full_video,
|
||||||
sample_tag,
|
sample_tag,
|
||||||
fps=args.save_fps)
|
fps=args.save_fps)
|
||||||
sample_full_video_file = f"{video_save_dir}/../{sample['videoid']}_full_fs{fs}.mp4"
|
sample_full_video_file = f"{video_save_dir}/../{sample['videoid']}_full_fs{fs}.mp4"
|
||||||
save_results_async(full_video, sample_full_video_file, fps=args.save_fps)
|
save_results(full_video, sample_full_video_file, fps=args.save_fps)
|
||||||
|
|
||||||
# Wait for all async I/O to complete
|
|
||||||
_flush_io()
|
|
||||||
|
|
||||||
|
|
||||||
def get_parser():
|
def get_parser():
|
||||||
@@ -919,11 +794,6 @@ def get_parser():
|
|||||||
action='store_true',
|
action='store_true',
|
||||||
default=False,
|
default=False,
|
||||||
help="not using the predicted states as comparison")
|
help="not using the predicted states as comparison")
|
||||||
parser.add_argument(
|
|
||||||
"--fast_policy_no_decode",
|
|
||||||
action='store_true',
|
|
||||||
default=True,
|
|
||||||
help="Speed mode: policy pass only predicts actions, skip policy video decode/log/save.")
|
|
||||||
parser.add_argument("--save_fps",
|
parser.add_argument("--save_fps",
|
||||||
type=int,
|
type=int,
|
||||||
default=8,
|
default=8,
|
||||||
|
|||||||
@@ -11,9 +11,6 @@ from unifolm_wma.utils.utils import instantiate_from_config
|
|||||||
from unifolm_wma.utils.train import get_trainer_callbacks, get_trainer_logger, get_trainer_strategy
|
from unifolm_wma.utils.train import get_trainer_callbacks, get_trainer_logger, get_trainer_strategy
|
||||||
from unifolm_wma.utils.train import set_logger, init_workspace, load_checkpoints, get_num_parameters
|
from unifolm_wma.utils.train import set_logger, init_workspace, load_checkpoints, get_num_parameters
|
||||||
|
|
||||||
torch.backends.cuda.matmul.allow_tf32 = True
|
|
||||||
torch.backends.cudnn.allow_tf32 = True
|
|
||||||
|
|
||||||
|
|
||||||
def get_parser(**parser_kwargs):
|
def get_parser(**parser_kwargs):
|
||||||
parser = argparse.ArgumentParser(**parser_kwargs)
|
parser = argparse.ArgumentParser(**parser_kwargs)
|
||||||
|
|||||||
@@ -988,7 +988,7 @@ class LatentDiffusion(DDPM):
|
|||||||
|
|
||||||
def instantiate_cond_stage(self, config: OmegaConf) -> None:
|
def instantiate_cond_stage(self, config: OmegaConf) -> None:
|
||||||
"""
|
"""
|
||||||
Build the conditioning stage model. Frozen models are converted to FP16.
|
Build the conditioning stage model.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
config: OmegaConf config describing the conditioning model to instantiate.
|
config: OmegaConf config describing the conditioning model to instantiate.
|
||||||
@@ -1000,7 +1000,6 @@ class LatentDiffusion(DDPM):
|
|||||||
self.cond_stage_model.train = disabled_train
|
self.cond_stage_model.train = disabled_train
|
||||||
for param in self.cond_stage_model.parameters():
|
for param in self.cond_stage_model.parameters():
|
||||||
param.requires_grad = False
|
param.requires_grad = False
|
||||||
self.cond_stage_model.half()
|
|
||||||
else:
|
else:
|
||||||
model = instantiate_from_config(config)
|
model = instantiate_from_config(config)
|
||||||
self.cond_stage_model = model
|
self.cond_stage_model = model
|
||||||
@@ -1015,7 +1014,6 @@ class LatentDiffusion(DDPM):
|
|||||||
Returns:
|
Returns:
|
||||||
Conditioning embedding as a tensor (shape depends on cond model).
|
Conditioning embedding as a tensor (shape depends on cond model).
|
||||||
"""
|
"""
|
||||||
with torch.cuda.amp.autocast(dtype=torch.float16):
|
|
||||||
if self.cond_stage_forward is None:
|
if self.cond_stage_forward is None:
|
||||||
if hasattr(self.cond_stage_model, 'encode') and callable(
|
if hasattr(self.cond_stage_model, 'encode') and callable(
|
||||||
self.cond_stage_model.encode):
|
self.cond_stage_model.encode):
|
||||||
@@ -1959,7 +1957,6 @@ class LatentVisualDiffusion(LatentDiffusion):
|
|||||||
self.image_proj_model.train = disabled_train
|
self.image_proj_model.train = disabled_train
|
||||||
for param in self.image_proj_model.parameters():
|
for param in self.image_proj_model.parameters():
|
||||||
param.requires_grad = False
|
param.requires_grad = False
|
||||||
self.image_proj_model.half()
|
|
||||||
|
|
||||||
def _init_embedder(self, config: OmegaConf, freeze: bool = True) -> None:
|
def _init_embedder(self, config: OmegaConf, freeze: bool = True) -> None:
|
||||||
"""
|
"""
|
||||||
@@ -1975,7 +1972,6 @@ class LatentVisualDiffusion(LatentDiffusion):
|
|||||||
self.embedder.train = disabled_train
|
self.embedder.train = disabled_train
|
||||||
for param in self.embedder.parameters():
|
for param in self.embedder.parameters():
|
||||||
param.requires_grad = False
|
param.requires_grad = False
|
||||||
self.embedder.half()
|
|
||||||
|
|
||||||
def init_normalizers(self, normalize_config: OmegaConf,
|
def init_normalizers(self, normalize_config: OmegaConf,
|
||||||
dataset_stats: Mapping[str, Any]) -> None:
|
dataset_stats: Mapping[str, Any]) -> None:
|
||||||
@@ -2179,7 +2175,6 @@ class LatentVisualDiffusion(LatentDiffusion):
|
|||||||
(random_num < 3 * self.uncond_prob).float(), "n -> n 1 1 1")
|
(random_num < 3 * self.uncond_prob).float(), "n -> n 1 1 1")
|
||||||
|
|
||||||
cond_img = input_mask * img
|
cond_img = input_mask * img
|
||||||
with torch.cuda.amp.autocast(dtype=torch.float16):
|
|
||||||
cond_img_emb = self.embedder(cond_img)
|
cond_img_emb = self.embedder(cond_img)
|
||||||
cond_img_emb = self.image_proj_model(cond_img_emb)
|
cond_img_emb = self.image_proj_model(cond_img_emb)
|
||||||
|
|
||||||
@@ -2196,7 +2191,6 @@ class LatentVisualDiffusion(LatentDiffusion):
|
|||||||
repeat=z.shape[2])
|
repeat=z.shape[2])
|
||||||
cond["c_concat"] = [img_cat_cond]
|
cond["c_concat"] = [img_cat_cond]
|
||||||
|
|
||||||
with torch.cuda.amp.autocast(dtype=torch.float16):
|
|
||||||
cond_action = self.action_projector(action)
|
cond_action = self.action_projector(action)
|
||||||
cond_action_emb = self.agent_action_pos_emb + cond_action
|
cond_action_emb = self.agent_action_pos_emb + cond_action
|
||||||
# Get conditioning states
|
# Get conditioning states
|
||||||
@@ -2463,17 +2457,7 @@ class DiffusionWrapper(pl.LightningModule):
|
|||||||
Returns:
|
Returns:
|
||||||
Output from the inner diffusion model (tensor or tuple, depending on the model).
|
Output from the inner diffusion model (tensor or tuple, depending on the model).
|
||||||
"""
|
"""
|
||||||
with torch.cuda.amp.autocast(dtype=torch.float16):
|
|
||||||
return self._forward_impl(x, x_action, x_state, t,
|
|
||||||
c_concat, c_crossattn, c_crossattn_action,
|
|
||||||
c_adm, s, mask, **kwargs)
|
|
||||||
|
|
||||||
def _forward_impl(
|
|
||||||
self,
|
|
||||||
x, x_action, x_state, t,
|
|
||||||
c_concat=None, c_crossattn=None, c_crossattn_action=None,
|
|
||||||
c_adm=None, s=None, mask=None, **kwargs,
|
|
||||||
):
|
|
||||||
if self.conditioning_key is None:
|
if self.conditioning_key is None:
|
||||||
out = self.diffusion_model(x, t)
|
out = self.diffusion_model(x, t)
|
||||||
elif self.conditioning_key == 'concat':
|
elif self.conditioning_key == 'concat':
|
||||||
|
|||||||
@@ -501,10 +501,6 @@ class ConditionalUnet1D(nn.Module):
|
|||||||
self.last_frame_only = last_frame_only
|
self.last_frame_only = last_frame_only
|
||||||
self.horizon = horizon
|
self.horizon = horizon
|
||||||
|
|
||||||
# Context precomputation cache
|
|
||||||
self._global_cond_cache_enabled = False
|
|
||||||
self._global_cond_cache = {}
|
|
||||||
|
|
||||||
def forward(self,
|
def forward(self,
|
||||||
sample: torch.Tensor,
|
sample: torch.Tensor,
|
||||||
timestep: Union[torch.Tensor, float, int],
|
timestep: Union[torch.Tensor, float, int],
|
||||||
@@ -534,10 +530,6 @@ class ConditionalUnet1D(nn.Module):
|
|||||||
B, T, D = sample.shape
|
B, T, D = sample.shape
|
||||||
if self.use_linear_act_proj:
|
if self.use_linear_act_proj:
|
||||||
sample = self.proj_in_action(sample.unsqueeze(-1))
|
sample = self.proj_in_action(sample.unsqueeze(-1))
|
||||||
_gc_key = (cond['image'].data_ptr(), cond['agent_pos'].data_ptr())
|
|
||||||
if self._global_cond_cache_enabled and _gc_key in self._global_cond_cache:
|
|
||||||
global_cond = self._global_cond_cache[_gc_key]
|
|
||||||
else:
|
|
||||||
global_cond = self.obs_encoder(cond)
|
global_cond = self.obs_encoder(cond)
|
||||||
global_cond = rearrange(global_cond,
|
global_cond = rearrange(global_cond,
|
||||||
'(b t) d -> b 1 (t d)',
|
'(b t) d -> b 1 (t d)',
|
||||||
@@ -546,8 +538,6 @@ class ConditionalUnet1D(nn.Module):
|
|||||||
global_cond = repeat(global_cond,
|
global_cond = repeat(global_cond,
|
||||||
'b c d -> b (repeat c) d',
|
'b c d -> b (repeat c) d',
|
||||||
repeat=T)
|
repeat=T)
|
||||||
if self._global_cond_cache_enabled:
|
|
||||||
self._global_cond_cache[_gc_key] = global_cond
|
|
||||||
else:
|
else:
|
||||||
sample = einops.rearrange(sample, 'b h t -> b t h')
|
sample = einops.rearrange(sample, 'b h t -> b t h')
|
||||||
sample = self.proj_in_horizon(sample)
|
sample = self.proj_in_horizon(sample)
|
||||||
|
|||||||
@@ -6,8 +6,6 @@ from unifolm_wma.utils.diffusion import make_ddim_sampling_parameters, make_ddim
|
|||||||
from unifolm_wma.utils.common import noise_like
|
from unifolm_wma.utils.common import noise_like
|
||||||
from unifolm_wma.utils.common import extract_into_tensor
|
from unifolm_wma.utils.common import extract_into_tensor
|
||||||
from tqdm import tqdm
|
from tqdm import tqdm
|
||||||
from unifolm_wma.modules.attention import enable_cross_attn_kv_cache, disable_cross_attn_kv_cache
|
|
||||||
from unifolm_wma.modules.networks.wma_model import enable_ctx_cache, disable_ctx_cache
|
|
||||||
|
|
||||||
|
|
||||||
class DDIMSampler(object):
|
class DDIMSampler(object):
|
||||||
@@ -69,12 +67,11 @@ class DDIMSampler(object):
|
|||||||
ddim_timesteps=self.ddim_timesteps,
|
ddim_timesteps=self.ddim_timesteps,
|
||||||
eta=ddim_eta,
|
eta=ddim_eta,
|
||||||
verbose=verbose)
|
verbose=verbose)
|
||||||
# Ensure tensors are on correct device for efficient indexing
|
self.register_buffer('ddim_sigmas', ddim_sigmas)
|
||||||
self.register_buffer('ddim_sigmas', to_torch(torch.as_tensor(ddim_sigmas)))
|
self.register_buffer('ddim_alphas', ddim_alphas)
|
||||||
self.register_buffer('ddim_alphas', to_torch(torch.as_tensor(ddim_alphas)))
|
self.register_buffer('ddim_alphas_prev', ddim_alphas_prev)
|
||||||
self.register_buffer('ddim_alphas_prev', to_torch(torch.as_tensor(ddim_alphas_prev)))
|
|
||||||
self.register_buffer('ddim_sqrt_one_minus_alphas',
|
self.register_buffer('ddim_sqrt_one_minus_alphas',
|
||||||
to_torch(torch.as_tensor(np.sqrt(1. - ddim_alphas))))
|
np.sqrt(1. - ddim_alphas))
|
||||||
sigmas_for_original_sampling_steps = ddim_eta * torch.sqrt(
|
sigmas_for_original_sampling_steps = ddim_eta * torch.sqrt(
|
||||||
(1 - self.alphas_cumprod_prev) / (1 - self.alphas_cumprod) *
|
(1 - self.alphas_cumprod_prev) / (1 - self.alphas_cumprod) *
|
||||||
(1 - self.alphas_cumprod / self.alphas_cumprod_prev))
|
(1 - self.alphas_cumprod / self.alphas_cumprod_prev))
|
||||||
@@ -244,13 +241,9 @@ class DDIMSampler(object):
|
|||||||
|
|
||||||
dp_ddim_scheduler_action.set_timesteps(len(timesteps))
|
dp_ddim_scheduler_action.set_timesteps(len(timesteps))
|
||||||
dp_ddim_scheduler_state.set_timesteps(len(timesteps))
|
dp_ddim_scheduler_state.set_timesteps(len(timesteps))
|
||||||
ts = torch.empty((b, ), device=device, dtype=torch.long)
|
|
||||||
enable_cross_attn_kv_cache(self.model)
|
|
||||||
enable_ctx_cache(self.model)
|
|
||||||
try:
|
|
||||||
for i, step in enumerate(iterator):
|
for i, step in enumerate(iterator):
|
||||||
index = total_steps - i - 1
|
index = total_steps - i - 1
|
||||||
ts.fill_(step)
|
ts = torch.full((b, ), step, device=device, dtype=torch.long)
|
||||||
|
|
||||||
# Use mask to blend noised original latent (img_orig) & new sampled latent (img)
|
# Use mask to blend noised original latent (img_orig) & new sampled latent (img)
|
||||||
if mask is not None:
|
if mask is not None:
|
||||||
@@ -305,9 +298,6 @@ class DDIMSampler(object):
|
|||||||
intermediates['pred_x0'].append(pred_x0)
|
intermediates['pred_x0'].append(pred_x0)
|
||||||
intermediates['x_inter_action'].append(action)
|
intermediates['x_inter_action'].append(action)
|
||||||
intermediates['x_inter_state'].append(state)
|
intermediates['x_inter_state'].append(state)
|
||||||
finally:
|
|
||||||
disable_cross_attn_kv_cache(self.model)
|
|
||||||
disable_ctx_cache(self.model)
|
|
||||||
|
|
||||||
return img, action, state, intermediates
|
return img, action, state, intermediates
|
||||||
|
|
||||||
@@ -335,6 +325,10 @@ class DDIMSampler(object):
|
|||||||
guidance_rescale=0.0,
|
guidance_rescale=0.0,
|
||||||
**kwargs):
|
**kwargs):
|
||||||
b, *_, device = *x.shape, x.device
|
b, *_, device = *x.shape, x.device
|
||||||
|
if x.dim() == 5:
|
||||||
|
is_video = True
|
||||||
|
else:
|
||||||
|
is_video = False
|
||||||
|
|
||||||
if unconditional_conditioning is None or unconditional_guidance_scale == 1.:
|
if unconditional_conditioning is None or unconditional_guidance_scale == 1.:
|
||||||
model_output, model_output_action, model_output_state = self.model.apply_model(
|
model_output, model_output_action, model_output_state = self.model.apply_model(
|
||||||
@@ -383,11 +377,17 @@ class DDIMSampler(object):
|
|||||||
sqrt_one_minus_alphas = self.model.sqrt_one_minus_alphas_cumprod if use_original_steps else self.ddim_sqrt_one_minus_alphas
|
sqrt_one_minus_alphas = self.model.sqrt_one_minus_alphas_cumprod if use_original_steps else self.ddim_sqrt_one_minus_alphas
|
||||||
sigmas = self.ddim_sigmas_for_original_num_steps if use_original_steps else self.ddim_sigmas
|
sigmas = self.ddim_sigmas_for_original_num_steps if use_original_steps else self.ddim_sigmas
|
||||||
|
|
||||||
# Use 0-d tensors directly (already on device); broadcasting handles shape
|
if is_video:
|
||||||
a_t = alphas[index]
|
size = (b, 1, 1, 1, 1)
|
||||||
a_prev = alphas_prev[index]
|
else:
|
||||||
sigma_t = sigmas[index]
|
size = (b, 1, 1, 1)
|
||||||
sqrt_one_minus_at = sqrt_one_minus_alphas[index]
|
|
||||||
|
a_t = torch.full(size, alphas[index], device=device)
|
||||||
|
a_prev = torch.full(size, alphas_prev[index], device=device)
|
||||||
|
sigma_t = torch.full(size, sigmas[index], device=device)
|
||||||
|
sqrt_one_minus_at = torch.full(size,
|
||||||
|
sqrt_one_minus_alphas[index],
|
||||||
|
device=device)
|
||||||
|
|
||||||
if self.model.parameterization != "v":
|
if self.model.parameterization != "v":
|
||||||
pred_x0 = (x - sqrt_one_minus_at * e_t) / a_t.sqrt()
|
pred_x0 = (x - sqrt_one_minus_at * e_t) / a_t.sqrt()
|
||||||
@@ -395,8 +395,12 @@ class DDIMSampler(object):
|
|||||||
pred_x0 = self.model.predict_start_from_z_and_v(x, t, model_output)
|
pred_x0 = self.model.predict_start_from_z_and_v(x, t, model_output)
|
||||||
|
|
||||||
if self.model.use_dynamic_rescale:
|
if self.model.use_dynamic_rescale:
|
||||||
scale_t = self.ddim_scale_arr[index]
|
scale_t = torch.full(size,
|
||||||
prev_scale_t = self.ddim_scale_arr_prev[index]
|
self.ddim_scale_arr[index],
|
||||||
|
device=device)
|
||||||
|
prev_scale_t = torch.full(size,
|
||||||
|
self.ddim_scale_arr_prev[index],
|
||||||
|
device=device)
|
||||||
rescale = (prev_scale_t / scale_t)
|
rescale = (prev_scale_t / scale_t)
|
||||||
pred_x0 *= rescale
|
pred_x0 *= rescale
|
||||||
|
|
||||||
|
|||||||
@@ -98,10 +98,6 @@ class CrossAttention(nn.Module):
|
|||||||
self.text_context_len = text_context_len
|
self.text_context_len = text_context_len
|
||||||
self.agent_state_context_len = agent_state_context_len
|
self.agent_state_context_len = agent_state_context_len
|
||||||
self.agent_action_context_len = agent_action_context_len
|
self.agent_action_context_len = agent_action_context_len
|
||||||
self._kv_cache = {}
|
|
||||||
self._kv_cache_enabled = False
|
|
||||||
self._kv_fused = False
|
|
||||||
|
|
||||||
self.cross_attention_scale_learnable = cross_attention_scale_learnable
|
self.cross_attention_scale_learnable = cross_attention_scale_learnable
|
||||||
if self.image_cross_attention:
|
if self.image_cross_attention:
|
||||||
self.to_k_ip = nn.Linear(context_dim, inner_dim, bias=False)
|
self.to_k_ip = nn.Linear(context_dim, inner_dim, bias=False)
|
||||||
@@ -118,27 +114,6 @@ class CrossAttention(nn.Module):
|
|||||||
self.register_parameter('alpha_caa',
|
self.register_parameter('alpha_caa',
|
||||||
nn.Parameter(torch.tensor(0.)))
|
nn.Parameter(torch.tensor(0.)))
|
||||||
|
|
||||||
def fuse_kv(self):
|
|
||||||
"""Fuse to_k/to_v into to_kv (2 Linear → 1). Works for all layers."""
|
|
||||||
k_w = self.to_k.weight # (inner_dim, context_dim)
|
|
||||||
v_w = self.to_v.weight
|
|
||||||
self.to_kv = nn.Linear(k_w.shape[1], k_w.shape[0] * 2, bias=False)
|
|
||||||
self.to_kv.weight = nn.Parameter(torch.cat([k_w, v_w], dim=0))
|
|
||||||
del self.to_k, self.to_v
|
|
||||||
if self.image_cross_attention:
|
|
||||||
for suffix in ('_ip', '_as', '_aa'):
|
|
||||||
k_attr = f'to_k{suffix}'
|
|
||||||
v_attr = f'to_v{suffix}'
|
|
||||||
kw = getattr(self, k_attr).weight
|
|
||||||
vw = getattr(self, v_attr).weight
|
|
||||||
fused = nn.Linear(kw.shape[1], kw.shape[0] * 2, bias=False)
|
|
||||||
fused.weight = nn.Parameter(torch.cat([kw, vw], dim=0))
|
|
||||||
setattr(self, f'to_kv{suffix}', fused)
|
|
||||||
delattr(self, k_attr)
|
|
||||||
delattr(self, v_attr)
|
|
||||||
self._kv_fused = True
|
|
||||||
return True
|
|
||||||
|
|
||||||
def forward(self, x, context=None, mask=None):
|
def forward(self, x, context=None, mask=None):
|
||||||
spatial_self_attn = (context is None)
|
spatial_self_attn = (context is None)
|
||||||
k_ip, v_ip, out_ip = None, None, None
|
k_ip, v_ip, out_ip = None, None, None
|
||||||
@@ -165,12 +140,6 @@ class CrossAttention(nn.Module):
|
|||||||
self.agent_action_context_len +
|
self.agent_action_context_len +
|
||||||
self.text_context_len:, :]
|
self.text_context_len:, :]
|
||||||
|
|
||||||
if self._kv_fused:
|
|
||||||
k, v = self.to_kv(context_ins).chunk(2, dim=-1)
|
|
||||||
k_ip, v_ip = self.to_kv_ip(context_image).chunk(2, dim=-1)
|
|
||||||
k_as, v_as = self.to_kv_as(context_agent_state).chunk(2, dim=-1)
|
|
||||||
k_aa, v_aa = self.to_kv_aa(context_agent_action).chunk(2, dim=-1)
|
|
||||||
else:
|
|
||||||
k = self.to_k(context_ins)
|
k = self.to_k(context_ins)
|
||||||
v = self.to_v(context_ins)
|
v = self.to_v(context_ins)
|
||||||
k_ip = self.to_k_ip(context_image)
|
k_ip = self.to_k_ip(context_image)
|
||||||
@@ -182,9 +151,6 @@ class CrossAttention(nn.Module):
|
|||||||
else:
|
else:
|
||||||
if not spatial_self_attn:
|
if not spatial_self_attn:
|
||||||
context = context[:, :self.text_context_len, :]
|
context = context[:, :self.text_context_len, :]
|
||||||
if self._kv_fused:
|
|
||||||
k, v = self.to_kv(context).chunk(2, dim=-1)
|
|
||||||
else:
|
|
||||||
k = self.to_k(context)
|
k = self.to_k(context)
|
||||||
v = self.to_v(context)
|
v = self.to_v(context)
|
||||||
|
|
||||||
@@ -270,78 +236,33 @@ class CrossAttention(nn.Module):
|
|||||||
k_ip, v_ip, out_ip = None, None, None
|
k_ip, v_ip, out_ip = None, None, None
|
||||||
k_as, v_as, out_as = None, None, None
|
k_as, v_as, out_as = None, None, None
|
||||||
k_aa, v_aa, out_aa = None, None, None
|
k_aa, v_aa, out_aa = None, None, None
|
||||||
attn_mask_aa = None
|
|
||||||
|
|
||||||
h = self.heads
|
|
||||||
q = self.to_q(x)
|
q = self.to_q(x)
|
||||||
context = default(context, x)
|
context = default(context, x)
|
||||||
|
|
||||||
b, _, _ = q.shape
|
if self.image_cross_attention and not spatial_self_attn:
|
||||||
q = q.unsqueeze(3).reshape(b, q.shape[1], h, self.dim_head).permute(0, 2, 1, 3).reshape(b * h, q.shape[1], self.dim_head).contiguous()
|
|
||||||
|
|
||||||
def _reshape_kv(t):
|
|
||||||
return t.unsqueeze(3).reshape(b, t.shape[1], h, self.dim_head).permute(0, 2, 1, 3).reshape(b * h, t.shape[1], self.dim_head).contiguous()
|
|
||||||
|
|
||||||
use_cache = self._kv_cache_enabled and not spatial_self_attn
|
|
||||||
cache_hit = use_cache and len(self._kv_cache) > 0
|
|
||||||
|
|
||||||
if cache_hit:
|
|
||||||
k = self._kv_cache['k']
|
|
||||||
v = self._kv_cache['v']
|
|
||||||
k_ip = self._kv_cache.get('k_ip')
|
|
||||||
v_ip = self._kv_cache.get('v_ip')
|
|
||||||
k_as = self._kv_cache.get('k_as')
|
|
||||||
v_as = self._kv_cache.get('v_as')
|
|
||||||
k_aa = self._kv_cache.get('k_aa')
|
|
||||||
v_aa = self._kv_cache.get('v_aa')
|
|
||||||
attn_mask_aa = self._kv_cache.get('attn_mask_aa')
|
|
||||||
elif self.image_cross_attention and not spatial_self_attn:
|
|
||||||
if context.shape[1] == self.text_context_len + self.video_length:
|
if context.shape[1] == self.text_context_len + self.video_length:
|
||||||
context_ins, context_image = context[:, :self.text_context_len, :], context[:,self.text_context_len:, :]
|
context_ins, context_image = context[:, :self.text_context_len, :], context[:,self.text_context_len:, :]
|
||||||
if self._kv_fused:
|
|
||||||
k, v = self.to_kv(context).chunk(2, dim=-1)
|
|
||||||
k_ip, v_ip = self.to_kv_ip(context_image).chunk(2, dim=-1)
|
|
||||||
else:
|
|
||||||
k = self.to_k(context)
|
k = self.to_k(context)
|
||||||
v = self.to_v(context)
|
v = self.to_v(context)
|
||||||
k_ip = self.to_k_ip(context_image)
|
k_ip = self.to_k_ip(context_image)
|
||||||
v_ip = self.to_v_ip(context_image)
|
v_ip = self.to_v_ip(context_image)
|
||||||
k, v = map(_reshape_kv, (k, v))
|
|
||||||
k_ip, v_ip = map(_reshape_kv, (k_ip, v_ip))
|
|
||||||
if use_cache:
|
|
||||||
self._kv_cache = {'k': k, 'v': v, 'k_ip': k_ip, 'v_ip': v_ip}
|
|
||||||
elif context.shape[1] == self.agent_state_context_len + self.text_context_len + self.video_length:
|
elif context.shape[1] == self.agent_state_context_len + self.text_context_len + self.video_length:
|
||||||
context_agent_state = context[:, :self.agent_state_context_len, :]
|
context_agent_state = context[:, :self.agent_state_context_len, :]
|
||||||
context_ins = context[:, self.agent_state_context_len:self.agent_state_context_len+self.text_context_len, :]
|
context_ins = context[:, self.agent_state_context_len:self.agent_state_context_len+self.text_context_len, :]
|
||||||
context_image = context[:, self.agent_state_context_len+self.text_context_len:, :]
|
context_image = context[:, self.agent_state_context_len+self.text_context_len:, :]
|
||||||
if self._kv_fused:
|
|
||||||
k, v = self.to_kv(context_ins).chunk(2, dim=-1)
|
|
||||||
k_ip, v_ip = self.to_kv_ip(context_image).chunk(2, dim=-1)
|
|
||||||
k_as, v_as = self.to_kv_as(context_agent_state).chunk(2, dim=-1)
|
|
||||||
else:
|
|
||||||
k = self.to_k(context_ins)
|
k = self.to_k(context_ins)
|
||||||
v = self.to_v(context_ins)
|
v = self.to_v(context_ins)
|
||||||
k_ip = self.to_k_ip(context_image)
|
k_ip = self.to_k_ip(context_image)
|
||||||
v_ip = self.to_v_ip(context_image)
|
v_ip = self.to_v_ip(context_image)
|
||||||
k_as = self.to_k_as(context_agent_state)
|
k_as = self.to_k_as(context_agent_state)
|
||||||
v_as = self.to_v_as(context_agent_state)
|
v_as = self.to_v_as(context_agent_state)
|
||||||
k, v = map(_reshape_kv, (k, v))
|
|
||||||
k_ip, v_ip = map(_reshape_kv, (k_ip, v_ip))
|
|
||||||
k_as, v_as = map(_reshape_kv, (k_as, v_as))
|
|
||||||
if use_cache:
|
|
||||||
self._kv_cache = {'k': k, 'v': v, 'k_ip': k_ip, 'v_ip': v_ip, 'k_as': k_as, 'v_as': v_as}
|
|
||||||
else:
|
else:
|
||||||
context_agent_state = context[:, :self.agent_state_context_len, :]
|
context_agent_state = context[:, :self.agent_state_context_len, :]
|
||||||
context_agent_action = context[:, self.agent_state_context_len:self.agent_state_context_len+self.agent_action_context_len, :]
|
context_agent_action = context[:, self.agent_state_context_len:self.agent_state_context_len+self.agent_action_context_len, :]
|
||||||
context_ins = context[:, self.agent_state_context_len+self.agent_action_context_len:self.agent_state_context_len+self.agent_action_context_len+self.text_context_len, :]
|
context_ins = context[:, self.agent_state_context_len+self.agent_action_context_len:self.agent_state_context_len+self.agent_action_context_len+self.text_context_len, :]
|
||||||
context_image = context[:, self.agent_state_context_len+self.agent_action_context_len+self.text_context_len:, :]
|
context_image = context[:, self.agent_state_context_len+self.agent_action_context_len+self.text_context_len:, :]
|
||||||
|
|
||||||
if self._kv_fused:
|
|
||||||
k, v = self.to_kv(context_ins).chunk(2, dim=-1)
|
|
||||||
k_ip, v_ip = self.to_kv_ip(context_image).chunk(2, dim=-1)
|
|
||||||
k_as, v_as = self.to_kv_as(context_agent_state).chunk(2, dim=-1)
|
|
||||||
k_aa, v_aa = self.to_kv_aa(context_agent_action).chunk(2, dim=-1)
|
|
||||||
else:
|
|
||||||
k = self.to_k(context_ins)
|
k = self.to_k(context_ins)
|
||||||
v = self.to_v(context_ins)
|
v = self.to_v(context_ins)
|
||||||
k_ip = self.to_k_ip(context_image)
|
k_ip = self.to_k_ip(context_image)
|
||||||
@@ -351,81 +272,98 @@ class CrossAttention(nn.Module):
|
|||||||
k_aa = self.to_k_aa(context_agent_action)
|
k_aa = self.to_k_aa(context_agent_action)
|
||||||
v_aa = self.to_v_aa(context_agent_action)
|
v_aa = self.to_v_aa(context_agent_action)
|
||||||
|
|
||||||
k, v = map(_reshape_kv, (k, v))
|
attn_mask_aa = self._get_attn_mask_aa(x.shape[0],
|
||||||
k_ip, v_ip = map(_reshape_kv, (k_ip, v_ip))
|
|
||||||
k_as, v_as = map(_reshape_kv, (k_as, v_as))
|
|
||||||
k_aa, v_aa = map(_reshape_kv, (k_aa, v_aa))
|
|
||||||
|
|
||||||
attn_mask_aa_raw = self._get_attn_mask_aa(x.shape[0],
|
|
||||||
q.shape[1],
|
q.shape[1],
|
||||||
k_aa.shape[1],
|
k_aa.shape[1],
|
||||||
block_size=16,
|
block_size=16).to(k_aa.device)
|
||||||
device=k_aa.device)
|
|
||||||
attn_mask_aa = attn_mask_aa_raw.unsqueeze(1).repeat(1, h, 1, 1).reshape(
|
|
||||||
b * h, attn_mask_aa_raw.shape[1], attn_mask_aa_raw.shape[2]).to(q.dtype)
|
|
||||||
|
|
||||||
if use_cache:
|
|
||||||
self._kv_cache = {
|
|
||||||
'k': k, 'v': v, 'k_ip': k_ip, 'v_ip': v_ip,
|
|
||||||
'k_as': k_as, 'v_as': v_as, 'k_aa': k_aa, 'v_aa': v_aa,
|
|
||||||
'attn_mask_aa': attn_mask_aa,
|
|
||||||
}
|
|
||||||
else:
|
else:
|
||||||
if not spatial_self_attn:
|
if not spatial_self_attn:
|
||||||
assert 1 > 2, ">>> ERROR: you should never go into here ..."
|
assert 1 > 2, ">>> ERROR: you should never go into here ..."
|
||||||
context = context[:, :self.text_context_len, :]
|
context = context[:, :self.text_context_len, :]
|
||||||
if self._kv_fused:
|
|
||||||
k, v = self.to_kv(context).chunk(2, dim=-1)
|
|
||||||
else:
|
|
||||||
k = self.to_k(context)
|
k = self.to_k(context)
|
||||||
v = self.to_v(context)
|
v = self.to_v(context)
|
||||||
k, v = map(_reshape_kv, (k, v))
|
|
||||||
if use_cache:
|
b, _, _ = q.shape
|
||||||
self._kv_cache = {'k': k, 'v': v}
|
q = q.unsqueeze(3).reshape(b, q.shape[1], self.heads, self.dim_head).permute(0, 2, 1, 3).reshape(b * self.heads, q.shape[1], self.dim_head).contiguous()
|
||||||
if k is not None:
|
if k is not None:
|
||||||
|
k, v = map(
|
||||||
|
lambda t: t.unsqueeze(3).reshape(b, t.shape[
|
||||||
|
1], self.heads, self.dim_head).permute(0, 2, 1, 3).reshape(
|
||||||
|
b * self.heads, t.shape[1], self.dim_head).contiguous(),
|
||||||
|
(k, v),
|
||||||
|
)
|
||||||
out = xformers.ops.memory_efficient_attention(q,
|
out = xformers.ops.memory_efficient_attention(q,
|
||||||
k,
|
k,
|
||||||
v,
|
v,
|
||||||
attn_bias=None,
|
attn_bias=None,
|
||||||
op=None)
|
op=None)
|
||||||
out = (out.unsqueeze(0).reshape(
|
out = (out.unsqueeze(0).reshape(
|
||||||
b, h, out.shape[1],
|
b, self.heads, out.shape[1],
|
||||||
self.dim_head).permute(0, 2, 1,
|
self.dim_head).permute(0, 2, 1,
|
||||||
3).reshape(b, out.shape[1],
|
3).reshape(b, out.shape[1],
|
||||||
h * self.dim_head))
|
self.heads * self.dim_head))
|
||||||
|
|
||||||
if k_ip is not None:
|
if k_ip is not None:
|
||||||
|
# For image cross-attention
|
||||||
|
k_ip, v_ip = map(
|
||||||
|
lambda t: t.unsqueeze(3).reshape(b, t.shape[
|
||||||
|
1], self.heads, self.dim_head).permute(0, 2, 1, 3).reshape(
|
||||||
|
b * self.heads, t.shape[1], self.dim_head).contiguous(
|
||||||
|
),
|
||||||
|
(k_ip, v_ip),
|
||||||
|
)
|
||||||
out_ip = xformers.ops.memory_efficient_attention(q,
|
out_ip = xformers.ops.memory_efficient_attention(q,
|
||||||
k_ip,
|
k_ip,
|
||||||
v_ip,
|
v_ip,
|
||||||
attn_bias=None,
|
attn_bias=None,
|
||||||
op=None)
|
op=None)
|
||||||
out_ip = (out_ip.unsqueeze(0).reshape(
|
out_ip = (out_ip.unsqueeze(0).reshape(
|
||||||
b, h, out_ip.shape[1],
|
b, self.heads, out_ip.shape[1],
|
||||||
self.dim_head).permute(0, 2, 1,
|
self.dim_head).permute(0, 2, 1,
|
||||||
3).reshape(b, out_ip.shape[1],
|
3).reshape(b, out_ip.shape[1],
|
||||||
h * self.dim_head))
|
self.heads * self.dim_head))
|
||||||
|
|
||||||
if k_as is not None:
|
if k_as is not None:
|
||||||
|
# For agent state cross-attention
|
||||||
|
k_as, v_as = map(
|
||||||
|
lambda t: t.unsqueeze(3).reshape(b, t.shape[
|
||||||
|
1], self.heads, self.dim_head).permute(0, 2, 1, 3).reshape(
|
||||||
|
b * self.heads, t.shape[1], self.dim_head).contiguous(
|
||||||
|
),
|
||||||
|
(k_as, v_as),
|
||||||
|
)
|
||||||
out_as = xformers.ops.memory_efficient_attention(q,
|
out_as = xformers.ops.memory_efficient_attention(q,
|
||||||
k_as,
|
k_as,
|
||||||
v_as,
|
v_as,
|
||||||
attn_bias=None,
|
attn_bias=None,
|
||||||
op=None)
|
op=None)
|
||||||
out_as = (out_as.unsqueeze(0).reshape(
|
out_as = (out_as.unsqueeze(0).reshape(
|
||||||
b, h, out_as.shape[1],
|
b, self.heads, out_as.shape[1],
|
||||||
self.dim_head).permute(0, 2, 1,
|
self.dim_head).permute(0, 2, 1,
|
||||||
3).reshape(b, out_as.shape[1],
|
3).reshape(b, out_as.shape[1],
|
||||||
h * self.dim_head))
|
self.heads * self.dim_head))
|
||||||
|
|
||||||
if k_aa is not None:
|
if k_aa is not None:
|
||||||
|
# For agent action cross-attention
|
||||||
|
k_aa, v_aa = map(
|
||||||
|
lambda t: t.unsqueeze(3).reshape(b, t.shape[
|
||||||
|
1], self.heads, self.dim_head).permute(0, 2, 1, 3).reshape(
|
||||||
|
b * self.heads, t.shape[1], self.dim_head).contiguous(
|
||||||
|
),
|
||||||
|
(k_aa, v_aa),
|
||||||
|
)
|
||||||
|
|
||||||
|
attn_mask_aa = attn_mask_aa.unsqueeze(1).repeat(1,self.heads,1,1).reshape(
|
||||||
|
b * self.heads, attn_mask_aa.shape[1], attn_mask_aa.shape[2])
|
||||||
|
attn_mask_aa = attn_mask_aa.to(q.dtype)
|
||||||
|
|
||||||
out_aa = xformers.ops.memory_efficient_attention(
|
out_aa = xformers.ops.memory_efficient_attention(
|
||||||
q, k_aa, v_aa, attn_bias=attn_mask_aa, op=None)
|
q, k_aa, v_aa, attn_bias=attn_mask_aa, op=None)
|
||||||
|
|
||||||
out_aa = (out_aa.unsqueeze(0).reshape(
|
out_aa = (out_aa.unsqueeze(0).reshape(
|
||||||
b, h, out_aa.shape[1],
|
b, self.heads, out_aa.shape[1],
|
||||||
self.dim_head).permute(0, 2, 1,
|
self.dim_head).permute(0, 2, 1,
|
||||||
3).reshape(b, out_aa.shape[1],
|
3).reshape(b, out_aa.shape[1],
|
||||||
h * self.dim_head))
|
self.heads * self.dim_head))
|
||||||
if exists(mask):
|
if exists(mask):
|
||||||
raise NotImplementedError
|
raise NotImplementedError
|
||||||
|
|
||||||
@@ -448,43 +386,17 @@ class CrossAttention(nn.Module):
|
|||||||
|
|
||||||
return self.to_out(out)
|
return self.to_out(out)
|
||||||
|
|
||||||
def _get_attn_mask_aa(self, b, l1, l2, block_size=16, device=None):
|
def _get_attn_mask_aa(self, b, l1, l2, block_size=16):
|
||||||
cache_key = (b, l1, l2, block_size)
|
|
||||||
if hasattr(self, '_attn_mask_aa_cache_key') and self._attn_mask_aa_cache_key == cache_key:
|
|
||||||
cached = self._attn_mask_aa_cache
|
|
||||||
if device is not None and cached.device != torch.device(device):
|
|
||||||
cached = cached.to(device)
|
|
||||||
self._attn_mask_aa_cache = cached
|
|
||||||
return cached
|
|
||||||
|
|
||||||
target_device = device if device is not None else 'cpu'
|
|
||||||
num_token = l2 // block_size
|
num_token = l2 // block_size
|
||||||
start_positions = ((torch.arange(b, device=target_device) % block_size) + 1) * num_token
|
start_positions = ((torch.arange(b) % block_size) + 1) * num_token
|
||||||
col_indices = torch.arange(l2, device=target_device)
|
col_indices = torch.arange(l2)
|
||||||
mask_2d = col_indices.unsqueeze(0) >= start_positions.unsqueeze(1)
|
mask_2d = col_indices.unsqueeze(0) >= start_positions.unsqueeze(1)
|
||||||
mask = mask_2d.unsqueeze(1).expand(b, l1, l2)
|
mask = mask_2d.unsqueeze(1).expand(b, l1, l2)
|
||||||
attn_mask = torch.zeros(b, l1, l2, dtype=torch.float, device=target_device)
|
attn_mask = torch.zeros_like(mask, dtype=torch.float)
|
||||||
attn_mask[mask] = float('-inf')
|
attn_mask[mask] = float('-inf')
|
||||||
|
|
||||||
self._attn_mask_aa_cache_key = cache_key
|
|
||||||
self._attn_mask_aa_cache = attn_mask
|
|
||||||
return attn_mask
|
return attn_mask
|
||||||
|
|
||||||
|
|
||||||
def enable_cross_attn_kv_cache(module):
|
|
||||||
for m in module.modules():
|
|
||||||
if isinstance(m, CrossAttention):
|
|
||||||
m._kv_cache_enabled = True
|
|
||||||
m._kv_cache = {}
|
|
||||||
|
|
||||||
|
|
||||||
def disable_cross_attn_kv_cache(module):
|
|
||||||
for m in module.modules():
|
|
||||||
if isinstance(m, CrossAttention):
|
|
||||||
m._kv_cache_enabled = False
|
|
||||||
m._kv_cache = {}
|
|
||||||
|
|
||||||
|
|
||||||
class BasicTransformerBlock(nn.Module):
|
class BasicTransformerBlock(nn.Module):
|
||||||
|
|
||||||
def __init__(self,
|
def __init__(self,
|
||||||
|
|||||||
@@ -685,21 +685,6 @@ class WMAModel(nn.Module):
|
|||||||
self.action_token_projector = instantiate_from_config(
|
self.action_token_projector = instantiate_from_config(
|
||||||
stem_process_config)
|
stem_process_config)
|
||||||
|
|
||||||
# Context precomputation cache
|
|
||||||
self._ctx_cache_enabled = False
|
|
||||||
self._ctx_cache = {}
|
|
||||||
# Reusable CUDA stream for parallel state_unet / action_unet
|
|
||||||
self._state_stream = torch.cuda.Stream()
|
|
||||||
|
|
||||||
def __getstate__(self):
|
|
||||||
state = self.__dict__.copy()
|
|
||||||
state.pop('_state_stream', None)
|
|
||||||
return state
|
|
||||||
|
|
||||||
def __setstate__(self, state):
|
|
||||||
self.__dict__.update(state)
|
|
||||||
self._state_stream = torch.cuda.Stream()
|
|
||||||
|
|
||||||
def forward(self,
|
def forward(self,
|
||||||
x: Tensor,
|
x: Tensor,
|
||||||
x_action: Tensor,
|
x_action: Tensor,
|
||||||
@@ -735,10 +720,6 @@ class WMAModel(nn.Module):
|
|||||||
repeat_only=False).type(x.dtype)
|
repeat_only=False).type(x.dtype)
|
||||||
emb = self.time_embed(t_emb)
|
emb = self.time_embed(t_emb)
|
||||||
|
|
||||||
_ctx_key = context.data_ptr()
|
|
||||||
if self._ctx_cache_enabled and _ctx_key in self._ctx_cache:
|
|
||||||
context = self._ctx_cache[_ctx_key]
|
|
||||||
else:
|
|
||||||
bt, l_context, _ = context.shape
|
bt, l_context, _ = context.shape
|
||||||
if self.base_model_gen_only:
|
if self.base_model_gen_only:
|
||||||
assert l_context == 77 + self.n_obs_steps * 16, ">>> ERROR Context dim 1 ..." ## NOTE HANDCODE
|
assert l_context == 77 + self.n_obs_steps * 16, ">>> ERROR Context dim 1 ..." ## NOTE HANDCODE
|
||||||
@@ -791,8 +772,6 @@ class WMAModel(nn.Module):
|
|||||||
context_img
|
context_img
|
||||||
],
|
],
|
||||||
dim=1)
|
dim=1)
|
||||||
if self._ctx_cache_enabled:
|
|
||||||
self._ctx_cache[_ctx_key] = context
|
|
||||||
|
|
||||||
emb = emb.repeat_interleave(repeats=t, dim=0)
|
emb = emb.repeat_interleave(repeats=t, dim=0)
|
||||||
|
|
||||||
@@ -853,45 +832,17 @@ class WMAModel(nn.Module):
|
|||||||
|
|
||||||
if not self.base_model_gen_only:
|
if not self.base_model_gen_only:
|
||||||
ba, _, _ = x_action.shape
|
ba, _, _ = x_action.shape
|
||||||
ts_state = timesteps[:ba] if b > 1 else timesteps
|
|
||||||
# Run action_unet and state_unet in parallel via CUDA streams
|
|
||||||
s_stream = self._state_stream
|
|
||||||
s_stream.wait_stream(torch.cuda.current_stream())
|
|
||||||
with torch.cuda.stream(s_stream):
|
|
||||||
s_y = self.state_unet(x_state, ts_state, hs_a,
|
|
||||||
context_action[:2], **kwargs)
|
|
||||||
a_y = self.action_unet(x_action, timesteps[:ba], hs_a,
|
a_y = self.action_unet(x_action, timesteps[:ba], hs_a,
|
||||||
context_action[:2], **kwargs)
|
context_action[:2], **kwargs)
|
||||||
torch.cuda.current_stream().wait_stream(s_stream)
|
# Predict state
|
||||||
|
if b > 1:
|
||||||
|
s_y = self.state_unet(x_state, timesteps[:ba], hs_a,
|
||||||
|
context_action[:2], **kwargs)
|
||||||
|
else:
|
||||||
|
s_y = self.state_unet(x_state, timesteps, hs_a,
|
||||||
|
context_action[:2], **kwargs)
|
||||||
else:
|
else:
|
||||||
a_y = torch.zeros_like(x_action)
|
a_y = torch.zeros_like(x_action)
|
||||||
s_y = torch.zeros_like(x_state)
|
s_y = torch.zeros_like(x_state)
|
||||||
|
|
||||||
return y, a_y, s_y
|
return y, a_y, s_y
|
||||||
|
|
||||||
|
|
||||||
def enable_ctx_cache(model):
|
|
||||||
"""Enable context precomputation cache on WMAModel and its action/state UNets."""
|
|
||||||
for m in model.modules():
|
|
||||||
if isinstance(m, WMAModel):
|
|
||||||
m._ctx_cache_enabled = True
|
|
||||||
m._ctx_cache = {}
|
|
||||||
# conditional_unet1d cache
|
|
||||||
from unifolm_wma.models.diffusion_head.conditional_unet1d import ConditionalUnet1D
|
|
||||||
for m in model.modules():
|
|
||||||
if isinstance(m, ConditionalUnet1D):
|
|
||||||
m._global_cond_cache_enabled = True
|
|
||||||
m._global_cond_cache = {}
|
|
||||||
|
|
||||||
|
|
||||||
def disable_ctx_cache(model):
|
|
||||||
"""Disable and clear context precomputation cache."""
|
|
||||||
for m in model.modules():
|
|
||||||
if isinstance(m, WMAModel):
|
|
||||||
m._ctx_cache_enabled = False
|
|
||||||
m._ctx_cache = {}
|
|
||||||
from unifolm_wma.models.diffusion_head.conditional_unet1d import ConditionalUnet1D
|
|
||||||
for m in model.modules():
|
|
||||||
if isinstance(m, ConditionalUnet1D):
|
|
||||||
m._global_cond_cache_enabled = False
|
|
||||||
m._global_cond_cache = {}
|
|
||||||
|
|||||||
@@ -1,16 +1,21 @@
|
|||||||
2026-02-11 19:14:09.599811: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
2026-02-11 19:49:03.885238: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
||||||
2026-02-11 19:14:09.649058: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
2026-02-11 19:49:03.934263: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
||||||
2026-02-11 19:14:09.649103: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
2026-02-11 19:49:03.934309: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
||||||
2026-02-11 19:14:09.650392: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
2026-02-11 19:49:03.935622: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
||||||
2026-02-11 19:14:09.657857: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
2026-02-11 19:49:03.943041: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
||||||
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
||||||
2026-02-11 19:14:10.584900: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
2026-02-11 19:49:04.852993: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
||||||
Global seed set to 123
|
Global seed set to 123
|
||||||
>>> Loading prepared model from ckpts/unifolm_wma_dual.ckpt.prepared.pt ...
|
INFO:mainlogger:LatentVisualDiffusion: Running in v-prediction mode
|
||||||
>>> Prepared model loaded.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Diffusion backbone (model.model) converted to FP16.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Projectors (image_proj_model, state_projector, action_projector) converted to FP16.
|
AE working on z of shape (1, 4, 32, 32) = 4096 dimensions.
|
||||||
>>> Encoders (cond_stage_model, embedder) converted to FP16.
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
>>> model checkpoint loaded.
|
||||||
|
>>> Load pre-trained model ...
|
||||||
INFO:root:***** Configing Data *****
|
INFO:root:***** Configing Data *****
|
||||||
>>> unitree_z1_stackbox: 1 data samples loaded.
|
>>> unitree_z1_stackbox: 1 data samples loaded.
|
||||||
>>> unitree_z1_stackbox: data stats loaded.
|
>>> unitree_z1_stackbox: data stats loaded.
|
||||||
@@ -28,46 +33,13 @@ INFO:root:***** Configing Data *****
|
|||||||
>>> unitree_g1_pack_camera: data stats loaded.
|
>>> unitree_g1_pack_camera: data stats loaded.
|
||||||
>>> unitree_g1_pack_camera: normalizer initiated.
|
>>> unitree_g1_pack_camera: normalizer initiated.
|
||||||
>>> Dataset is successfully loaded ...
|
>>> Dataset is successfully loaded ...
|
||||||
✓ KV fused: 66 attention layers
|
|
||||||
>>> Generate 16 frames under each generation ...
|
>>> Generate 16 frames under each generation ...
|
||||||
DEBUG:h5py._conv:Creating converter from 3 to 5
|
DEBUG:h5py._conv:Creating converter from 3 to 5
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
||||||
|
|
||||||
0%| | 0/11 [00:00<?, ?it/s]
|
|
||||||
0%| | 0/11 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
0%| | 0/11 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
||||||
18%|█▊ | 2/11 [00:45<03:22, 22.52s/it]
|
|
||||||
27%|██▋ | 3/11 [01:07<03:00, 22.52s/it]
|
|
||||||
36%|███▋ | 4/11 [01:30<02:38, 22.60s/it]
|
|
||||||
45%|████▌ | 5/11 [01:53<02:16, 22.70s/it]
|
|
||||||
55%|█████▍ | 6/11 [02:16<01:53, 22.74s/it]
|
|
||||||
64%|██████▎ | 7/11 [02:38<01:31, 22.76s/it]
|
|
||||||
73%|███████▎ | 8/11 [03:01<01:08, 22.77s/it]
|
|
||||||
82%|████████▏ | 9/11 [03:24<00:45, 22.76s/it]
|
|
||||||
91%|█████████ | 10/11 [03:47<00:22, 22.76s/it]
|
|
||||||
100%|██████████| 11/11 [04:09<00:00, 22.77s/it]
|
|
||||||
100%|██████████| 11/11 [04:09<00:00, 22.73s/it]
|
|
||||||
>>> Step 0: generating actions ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 1: generating actions ...
|
|
||||||
>>> Step 1: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 2: generating actions ...
|
|
||||||
>>> Step 2: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 3: generating actions ...
|
|
||||||
>>> Step 3: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 4: generating actions ...
|
|
||||||
>>> Step 4: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 5: generating actions ...
|
|
||||||
>>> Step 5: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 6: generating actions ...
|
|
||||||
>>> Step 6: interacting with world model ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
>>> Step 0: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing BlpImagePlugin
|
DEBUG:PIL.Image:Importing BlpImagePlugin
|
||||||
DEBUG:PIL.Image:Importing BmpImagePlugin
|
DEBUG:PIL.Image:Importing BmpImagePlugin
|
||||||
@@ -117,7 +89,39 @@ DEBUG:PIL.Image:Importing WmfImagePlugin
|
|||||||
DEBUG:PIL.Image:Importing WmfImagePlugin
|
DEBUG:PIL.Image:Importing WmfImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XbmImagePlugin
|
DEBUG:PIL.Image:Importing XbmImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XpmImagePlugin
|
DEBUG:PIL.Image:Importing XpmImagePlugin
|
||||||
|
DEBUG:PIL.Image:Importing XVThumbImagePlugin
|
||||||
|
|
||||||
|
9%|▉ | 1/11 [01:12<12:09, 72.95s/it]
|
||||||
|
18%|█▊ | 2/11 [02:26<10:58, 73.19s/it]
|
||||||
|
27%|██▋ | 3/11 [03:39<09:45, 73.21s/it]
|
||||||
|
36%|███▋ | 4/11 [04:52<08:32, 73.21s/it]
|
||||||
|
45%|████▌ | 5/11 [06:05<07:19, 73.22s/it]
|
||||||
|
55%|█████▍ | 6/11 [07:19<06:06, 73.20s/it]
|
||||||
|
64%|██████▎ | 7/11 [08:32<04:52, 73.16s/it]
|
||||||
|
73%|███████▎ | 8/11 [09:45<03:39, 73.14s/it]
|
||||||
|
82%|████████▏ | 9/11 [10:58<02:26, 73.13s/it]
|
||||||
|
91%|█████████ | 10/11 [12:11<01:13, 73.15s/it]
|
||||||
|
100%|██████████| 11/11 [13:24<00:00, 73.18s/it]
|
||||||
|
100%|██████████| 11/11 [13:24<00:00, 73.17s/it]
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 1: generating actions ...
|
||||||
|
>>> Step 1: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 2: generating actions ...
|
||||||
|
>>> Step 2: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 3: generating actions ...
|
||||||
|
>>> Step 3: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 4: generating actions ...
|
||||||
|
>>> Step 4: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 5: generating actions ...
|
||||||
|
>>> Step 5: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 6: generating actions ...
|
||||||
|
>>> Step 6: interacting with world model ...
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
DEBUG:PIL.Image:Importing PsdImagePlugin
|
>>> Step 7: generating actions ...
|
||||||
DEBUG:PIL.Image:Importing QoiImagePlugin
|
>>> Step 7: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing SgiImagePlugin
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
|||||||
Binary file not shown.
@@ -1,5 +1,5 @@
|
|||||||
{
|
{
|
||||||
"gt_video": "unitree_g1_pack_camera/case1/unitree_g1_pack_camera_case1.mp4",
|
"gt_video": "unitree_g1_pack_camera/case1/unitree_g1_pack_camera_case1.mp4",
|
||||||
"pred_video": "unitree_g1_pack_camera/case1/output/inference/0_full_fs6.mp4",
|
"pred_video": "unitree_g1_pack_camera/case1/output/inference/0_full_fs6.mp4",
|
||||||
"psnr": 32.340256576190384
|
"psnr": 35.615362167470806
|
||||||
}
|
}
|
||||||
@@ -20,6 +20,5 @@ dataset="unitree_g1_pack_camera"
|
|||||||
--n_iter 11 \
|
--n_iter 11 \
|
||||||
--timestep_spacing 'uniform_trailing' \
|
--timestep_spacing 'uniform_trailing' \
|
||||||
--guidance_rescale 0.7 \
|
--guidance_rescale 0.7 \
|
||||||
--perframe_ae \
|
--perframe_ae
|
||||||
--fast_policy_no_decode
|
|
||||||
} 2>&1 | tee "${res_dir}/output.log"
|
} 2>&1 | tee "${res_dir}/output.log"
|
||||||
|
|||||||
@@ -1,16 +1,21 @@
|
|||||||
2026-02-11 17:41:30.163933: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
2026-02-11 20:04:06.049535: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
||||||
2026-02-11 17:41:30.213409: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
2026-02-11 20:04:06.099186: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
||||||
2026-02-11 17:41:30.213453: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
2026-02-11 20:04:06.099232: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
||||||
2026-02-11 17:41:30.214760: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
2026-02-11 20:04:06.100544: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
||||||
2026-02-11 17:41:30.222233: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
2026-02-11 20:04:06.108023: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
||||||
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
||||||
2026-02-11 17:41:31.146811: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
2026-02-11 20:04:07.025500: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
||||||
Global seed set to 123
|
Global seed set to 123
|
||||||
>>> Loading prepared model from ckpts/unifolm_wma_dual.ckpt.prepared.pt ...
|
INFO:mainlogger:LatentVisualDiffusion: Running in v-prediction mode
|
||||||
>>> Prepared model loaded.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Diffusion backbone (model.model) converted to FP16.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Projectors (image_proj_model, state_projector, action_projector) converted to FP16.
|
AE working on z of shape (1, 4, 32, 32) = 4096 dimensions.
|
||||||
>>> Encoders (cond_stage_model, embedder) converted to FP16.
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
>>> model checkpoint loaded.
|
||||||
|
>>> Load pre-trained model ...
|
||||||
INFO:root:***** Configing Data *****
|
INFO:root:***** Configing Data *****
|
||||||
>>> unitree_z1_stackbox: 1 data samples loaded.
|
>>> unitree_z1_stackbox: 1 data samples loaded.
|
||||||
>>> unitree_z1_stackbox: data stats loaded.
|
>>> unitree_z1_stackbox: data stats loaded.
|
||||||
@@ -28,46 +33,13 @@ INFO:root:***** Configing Data *****
|
|||||||
>>> unitree_g1_pack_camera: data stats loaded.
|
>>> unitree_g1_pack_camera: data stats loaded.
|
||||||
>>> unitree_g1_pack_camera: normalizer initiated.
|
>>> unitree_g1_pack_camera: normalizer initiated.
|
||||||
>>> Dataset is successfully loaded ...
|
>>> Dataset is successfully loaded ...
|
||||||
✓ KV fused: 66 attention layers
|
|
||||||
>>> Generate 16 frames under each generation ...
|
>>> Generate 16 frames under each generation ...
|
||||||
DEBUG:h5py._conv:Creating converter from 3 to 5
|
DEBUG:h5py._conv:Creating converter from 3 to 5
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
||||||
|
|
||||||
0%| | 0/11 [00:00<?, ?it/s]
|
|
||||||
0%| | 0/11 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
0%| | 0/11 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
||||||
18%|█▊ | 2/11 [00:46<03:26, 23.00s/it]
|
|
||||||
27%|██▋ | 3/11 [01:08<03:03, 22.93s/it]
|
|
||||||
36%|███▋ | 4/11 [01:31<02:40, 22.88s/it]
|
|
||||||
45%|████▌ | 5/11 [01:54<02:17, 22.86s/it]
|
|
||||||
55%|█████▍ | 6/11 [02:17<01:54, 22.84s/it]
|
|
||||||
64%|██████▎ | 7/11 [02:40<01:31, 22.82s/it]
|
|
||||||
73%|███████▎ | 8/11 [03:02<01:08, 22.80s/it]
|
|
||||||
82%|████████▏ | 9/11 [03:25<00:45, 22.78s/it]
|
|
||||||
91%|█████████ | 10/11 [03:48<00:22, 22.77s/it]
|
|
||||||
100%|██████████| 11/11 [04:11<00:00, 22.76s/it]
|
|
||||||
100%|██████████| 11/11 [04:11<00:00, 22.83s/it]
|
|
||||||
>>> Step 0: generating actions ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 1: generating actions ...
|
|
||||||
>>> Step 1: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 2: generating actions ...
|
|
||||||
>>> Step 2: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 3: generating actions ...
|
|
||||||
>>> Step 3: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 4: generating actions ...
|
|
||||||
>>> Step 4: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 5: generating actions ...
|
|
||||||
>>> Step 5: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 6: generating actions ...
|
|
||||||
>>> Step 6: interacting with world model ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
>>> Step 0: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing BlpImagePlugin
|
DEBUG:PIL.Image:Importing BlpImagePlugin
|
||||||
DEBUG:PIL.Image:Importing BmpImagePlugin
|
DEBUG:PIL.Image:Importing BmpImagePlugin
|
||||||
@@ -117,7 +89,39 @@ DEBUG:PIL.Image:Importing WmfImagePlugin
|
|||||||
DEBUG:PIL.Image:Importing WmfImagePlugin
|
DEBUG:PIL.Image:Importing WmfImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XbmImagePlugin
|
DEBUG:PIL.Image:Importing XbmImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XpmImagePlugin
|
DEBUG:PIL.Image:Importing XpmImagePlugin
|
||||||
|
DEBUG:PIL.Image:Importing XVThumbImagePlugin
|
||||||
|
|
||||||
|
9%|▉ | 1/11 [01:14<12:22, 74.22s/it]
|
||||||
|
18%|█▊ | 2/11 [02:28<11:09, 74.33s/it]
|
||||||
|
27%|██▋ | 3/11 [03:42<09:54, 74.32s/it]
|
||||||
|
36%|███▋ | 4/11 [04:57<08:40, 74.32s/it]
|
||||||
|
45%|████▌ | 5/11 [06:11<07:25, 74.28s/it]
|
||||||
|
55%|█████▍ | 6/11 [07:25<06:10, 74.19s/it]
|
||||||
|
64%|██████▎ | 7/11 [08:39<04:56, 74.11s/it]
|
||||||
|
73%|███████▎ | 8/11 [09:53<03:42, 74.07s/it]
|
||||||
|
82%|████████▏ | 9/11 [11:07<02:28, 74.06s/it]
|
||||||
|
91%|█████████ | 10/11 [12:21<01:14, 74.01s/it]
|
||||||
|
100%|██████████| 11/11 [13:35<00:00, 73.98s/it]
|
||||||
|
100%|██████████| 11/11 [13:35<00:00, 74.12s/it]
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 1: generating actions ...
|
||||||
|
>>> Step 1: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 2: generating actions ...
|
||||||
|
>>> Step 2: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 3: generating actions ...
|
||||||
|
>>> Step 3: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 4: generating actions ...
|
||||||
|
>>> Step 4: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 5: generating actions ...
|
||||||
|
>>> Step 5: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 6: generating actions ...
|
||||||
|
>>> Step 6: interacting with world model ...
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
DEBUG:PIL.Image:Importing PsdImagePlugin
|
>>> Step 7: generating actions ...
|
||||||
DEBUG:PIL.Image:Importing QoiImagePlugin
|
>>> Step 7: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing SgiImagePlugin
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
{
|
{
|
||||||
"gt_video": "unitree_g1_pack_camera/case2/unitree_g1_pack_camera_case2.mp4",
|
"gt_video": "unitree_g1_pack_camera/case2/unitree_g1_pack_camera_case2.mp4",
|
||||||
"pred_video": "unitree_g1_pack_camera/case2/output/inference/50_full_fs6.mp4",
|
"pred_video": "unitree_g1_pack_camera/case2/output/inference/50_full_fs6.mp4",
|
||||||
"psnr": 37.49178506869336
|
"psnr": 34.61979248212279
|
||||||
}
|
}
|
||||||
@@ -20,6 +20,5 @@ dataset="unitree_g1_pack_camera"
|
|||||||
--n_iter 11 \
|
--n_iter 11 \
|
||||||
--timestep_spacing 'uniform_trailing' \
|
--timestep_spacing 'uniform_trailing' \
|
||||||
--guidance_rescale 0.7 \
|
--guidance_rescale 0.7 \
|
||||||
--perframe_ae \
|
--perframe_ae
|
||||||
--fast_policy_no_decode
|
|
||||||
} 2>&1 | tee "${res_dir}/output.log"
|
} 2>&1 | tee "${res_dir}/output.log"
|
||||||
|
|||||||
@@ -1,16 +1,21 @@
|
|||||||
2026-02-11 17:46:20.925463: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
2026-02-11 20:19:19.271045: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
||||||
2026-02-11 17:46:20.976293: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
2026-02-11 20:19:19.320688: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
||||||
2026-02-11 17:46:20.976338: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
2026-02-11 20:19:19.320734: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
||||||
2026-02-11 17:46:20.977650: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
2026-02-11 20:19:19.322059: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
||||||
2026-02-11 17:46:20.985133: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
2026-02-11 20:19:19.329606: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
||||||
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
||||||
2026-02-11 17:46:21.909964: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
2026-02-11 20:19:20.248938: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
||||||
Global seed set to 123
|
Global seed set to 123
|
||||||
>>> Loading prepared model from ckpts/unifolm_wma_dual.ckpt.prepared.pt ...
|
INFO:mainlogger:LatentVisualDiffusion: Running in v-prediction mode
|
||||||
>>> Prepared model loaded.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Diffusion backbone (model.model) converted to FP16.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Projectors (image_proj_model, state_projector, action_projector) converted to FP16.
|
AE working on z of shape (1, 4, 32, 32) = 4096 dimensions.
|
||||||
>>> Encoders (cond_stage_model, embedder) converted to FP16.
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
>>> model checkpoint loaded.
|
||||||
|
>>> Load pre-trained model ...
|
||||||
INFO:root:***** Configing Data *****
|
INFO:root:***** Configing Data *****
|
||||||
>>> unitree_z1_stackbox: 1 data samples loaded.
|
>>> unitree_z1_stackbox: 1 data samples loaded.
|
||||||
>>> unitree_z1_stackbox: data stats loaded.
|
>>> unitree_z1_stackbox: data stats loaded.
|
||||||
@@ -28,46 +33,13 @@ INFO:root:***** Configing Data *****
|
|||||||
>>> unitree_g1_pack_camera: data stats loaded.
|
>>> unitree_g1_pack_camera: data stats loaded.
|
||||||
>>> unitree_g1_pack_camera: normalizer initiated.
|
>>> unitree_g1_pack_camera: normalizer initiated.
|
||||||
>>> Dataset is successfully loaded ...
|
>>> Dataset is successfully loaded ...
|
||||||
✓ KV fused: 66 attention layers
|
|
||||||
>>> Generate 16 frames under each generation ...
|
>>> Generate 16 frames under each generation ...
|
||||||
DEBUG:h5py._conv:Creating converter from 3 to 5
|
DEBUG:h5py._conv:Creating converter from 3 to 5
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
||||||
|
|
||||||
0%| | 0/11 [00:00<?, ?it/s]
|
|
||||||
0%| | 0/11 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
0%| | 0/11 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
||||||
18%|█▊ | 2/11 [00:46<03:27, 23.07s/it]
|
|
||||||
27%|██▋ | 3/11 [01:09<03:03, 22.99s/it]
|
|
||||||
36%|███▋ | 4/11 [01:32<02:40, 22.94s/it]
|
|
||||||
45%|████▌ | 5/11 [01:54<02:17, 22.90s/it]
|
|
||||||
55%|█████▍ | 6/11 [02:17<01:54, 22.87s/it]
|
|
||||||
64%|██████▎ | 7/11 [02:40<01:31, 22.85s/it]
|
|
||||||
73%|███████▎ | 8/11 [03:03<01:08, 22.83s/it]
|
|
||||||
82%|████████▏ | 9/11 [03:26<00:45, 22.81s/it]
|
|
||||||
91%|█████████ | 10/11 [03:48<00:22, 22.78s/it]
|
|
||||||
100%|██████████| 11/11 [04:11<00:00, 22.76s/it]
|
|
||||||
100%|██████████| 11/11 [04:11<00:00, 22.86s/it]
|
|
||||||
>>> Step 0: generating actions ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 1: generating actions ...
|
|
||||||
>>> Step 1: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 2: generating actions ...
|
|
||||||
>>> Step 2: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 3: generating actions ...
|
|
||||||
>>> Step 3: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 4: generating actions ...
|
|
||||||
>>> Step 4: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 5: generating actions ...
|
|
||||||
>>> Step 5: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 6: generating actions ...
|
|
||||||
>>> Step 6: interacting with world model ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
>>> Step 0: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing BlpImagePlugin
|
DEBUG:PIL.Image:Importing BlpImagePlugin
|
||||||
DEBUG:PIL.Image:Importing BmpImagePlugin
|
DEBUG:PIL.Image:Importing BmpImagePlugin
|
||||||
@@ -117,7 +89,39 @@ DEBUG:PIL.Image:Importing WmfImagePlugin
|
|||||||
DEBUG:PIL.Image:Importing WmfImagePlugin
|
DEBUG:PIL.Image:Importing WmfImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XbmImagePlugin
|
DEBUG:PIL.Image:Importing XbmImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XpmImagePlugin
|
DEBUG:PIL.Image:Importing XpmImagePlugin
|
||||||
|
DEBUG:PIL.Image:Importing XVThumbImagePlugin
|
||||||
|
|
||||||
|
9%|▉ | 1/11 [01:14<12:22, 74.28s/it]
|
||||||
|
18%|█▊ | 2/11 [02:28<11:09, 74.38s/it]
|
||||||
|
27%|██▋ | 3/11 [03:43<09:55, 74.45s/it]
|
||||||
|
36%|███▋ | 4/11 [04:57<08:41, 74.43s/it]
|
||||||
|
45%|████▌ | 5/11 [06:11<07:25, 74.25s/it]
|
||||||
|
55%|█████▍ | 6/11 [07:26<06:11, 74.31s/it]
|
||||||
|
64%|██████▎ | 7/11 [08:40<04:57, 74.26s/it]
|
||||||
|
73%|███████▎ | 8/11 [09:54<03:43, 74.34s/it]
|
||||||
|
82%|████████▏ | 9/11 [11:08<02:28, 74.29s/it]
|
||||||
|
91%|█████████ | 10/11 [12:23<01:14, 74.26s/it]
|
||||||
|
100%|██████████| 11/11 [13:37<00:00, 74.39s/it]
|
||||||
|
100%|██████████| 11/11 [13:37<00:00, 74.34s/it]
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 1: generating actions ...
|
||||||
|
>>> Step 1: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 2: generating actions ...
|
||||||
|
>>> Step 2: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 3: generating actions ...
|
||||||
|
>>> Step 3: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 4: generating actions ...
|
||||||
|
>>> Step 4: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 5: generating actions ...
|
||||||
|
>>> Step 5: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 6: generating actions ...
|
||||||
|
>>> Step 6: interacting with world model ...
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
DEBUG:PIL.Image:Importing PsdImagePlugin
|
>>> Step 7: generating actions ...
|
||||||
DEBUG:PIL.Image:Importing QoiImagePlugin
|
>>> Step 7: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing SgiImagePlugin
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
{
|
{
|
||||||
"gt_video": "unitree_g1_pack_camera/case3/unitree_g1_pack_camera_case3.mp4",
|
"gt_video": "unitree_g1_pack_camera/case3/unitree_g1_pack_camera_case3.mp4",
|
||||||
"pred_video": "unitree_g1_pack_camera/case3/output/inference/100_full_fs6.mp4",
|
"pred_video": "unitree_g1_pack_camera/case3/output/inference/100_full_fs6.mp4",
|
||||||
"psnr": 29.88155122131729
|
"psnr": 37.034952654534486
|
||||||
}
|
}
|
||||||
@@ -20,6 +20,5 @@ dataset="unitree_g1_pack_camera"
|
|||||||
--n_iter 11 \
|
--n_iter 11 \
|
||||||
--timestep_spacing 'uniform_trailing' \
|
--timestep_spacing 'uniform_trailing' \
|
||||||
--guidance_rescale 0.7 \
|
--guidance_rescale 0.7 \
|
||||||
--perframe_ae \
|
--perframe_ae
|
||||||
--fast_policy_no_decode
|
|
||||||
} 2>&1 | tee "${res_dir}/output.log"
|
} 2>&1 | tee "${res_dir}/output.log"
|
||||||
|
|||||||
@@ -1,16 +1,21 @@
|
|||||||
2026-02-11 17:51:11.566934: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
2026-02-11 20:34:34.563818: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
||||||
2026-02-11 17:51:11.616260: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
2026-02-11 20:34:34.613426: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
||||||
2026-02-11 17:51:11.616305: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
2026-02-11 20:34:34.613485: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
||||||
2026-02-11 17:51:11.617626: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
2026-02-11 20:34:34.614802: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
||||||
2026-02-11 17:51:11.625103: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
2026-02-11 20:34:34.622286: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
||||||
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
||||||
2026-02-11 17:51:12.538539: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
2026-02-11 20:34:35.540506: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
||||||
Global seed set to 123
|
Global seed set to 123
|
||||||
>>> Loading prepared model from ckpts/unifolm_wma_dual.ckpt.prepared.pt ...
|
INFO:mainlogger:LatentVisualDiffusion: Running in v-prediction mode
|
||||||
>>> Prepared model loaded.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Diffusion backbone (model.model) converted to FP16.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Projectors (image_proj_model, state_projector, action_projector) converted to FP16.
|
AE working on z of shape (1, 4, 32, 32) = 4096 dimensions.
|
||||||
>>> Encoders (cond_stage_model, embedder) converted to FP16.
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
>>> model checkpoint loaded.
|
||||||
|
>>> Load pre-trained model ...
|
||||||
INFO:root:***** Configing Data *****
|
INFO:root:***** Configing Data *****
|
||||||
>>> unitree_z1_stackbox: 1 data samples loaded.
|
>>> unitree_z1_stackbox: 1 data samples loaded.
|
||||||
>>> unitree_z1_stackbox: data stats loaded.
|
>>> unitree_z1_stackbox: data stats loaded.
|
||||||
@@ -28,46 +33,13 @@ INFO:root:***** Configing Data *****
|
|||||||
>>> unitree_g1_pack_camera: data stats loaded.
|
>>> unitree_g1_pack_camera: data stats loaded.
|
||||||
>>> unitree_g1_pack_camera: normalizer initiated.
|
>>> unitree_g1_pack_camera: normalizer initiated.
|
||||||
>>> Dataset is successfully loaded ...
|
>>> Dataset is successfully loaded ...
|
||||||
✓ KV fused: 66 attention layers
|
|
||||||
>>> Generate 16 frames under each generation ...
|
>>> Generate 16 frames under each generation ...
|
||||||
DEBUG:h5py._conv:Creating converter from 3 to 5
|
DEBUG:h5py._conv:Creating converter from 3 to 5
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
||||||
|
|
||||||
0%| | 0/11 [00:00<?, ?it/s]
|
|
||||||
0%| | 0/11 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
0%| | 0/11 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
||||||
18%|█▊ | 2/11 [00:46<03:26, 22.96s/it]
|
|
||||||
27%|██▋ | 3/11 [01:08<03:03, 22.89s/it]
|
|
||||||
36%|███▋ | 4/11 [01:31<02:40, 22.86s/it]
|
|
||||||
45%|████▌ | 5/11 [01:54<02:16, 22.82s/it]
|
|
||||||
55%|█████▍ | 6/11 [02:17<01:54, 22.80s/it]
|
|
||||||
64%|██████▎ | 7/11 [02:39<01:31, 22.77s/it]
|
|
||||||
73%|███████▎ | 8/11 [03:02<01:08, 22.75s/it]
|
|
||||||
82%|████████▏ | 9/11 [03:25<00:45, 22.73s/it]
|
|
||||||
91%|█████████ | 10/11 [03:47<00:22, 22.72s/it]
|
|
||||||
100%|██████████| 11/11 [04:10<00:00, 22.73s/it]
|
|
||||||
100%|██████████| 11/11 [04:10<00:00, 22.79s/it]
|
|
||||||
>>> Step 0: generating actions ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 1: generating actions ...
|
|
||||||
>>> Step 1: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 2: generating actions ...
|
|
||||||
>>> Step 2: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 3: generating actions ...
|
|
||||||
>>> Step 3: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 4: generating actions ...
|
|
||||||
>>> Step 4: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 5: generating actions ...
|
|
||||||
>>> Step 5: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 6: generating actions ...
|
|
||||||
>>> Step 6: interacting with world model ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
>>> Step 0: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing BlpImagePlugin
|
DEBUG:PIL.Image:Importing BlpImagePlugin
|
||||||
DEBUG:PIL.Image:Importing BmpImagePlugin
|
DEBUG:PIL.Image:Importing BmpImagePlugin
|
||||||
@@ -117,7 +89,39 @@ DEBUG:PIL.Image:Importing WmfImagePlugin
|
|||||||
DEBUG:PIL.Image:Importing WmfImagePlugin
|
DEBUG:PIL.Image:Importing WmfImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XbmImagePlugin
|
DEBUG:PIL.Image:Importing XbmImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XpmImagePlugin
|
DEBUG:PIL.Image:Importing XpmImagePlugin
|
||||||
|
DEBUG:PIL.Image:Importing XVThumbImagePlugin
|
||||||
|
|
||||||
|
9%|▉ | 1/11 [01:14<12:25, 74.52s/it]
|
||||||
|
18%|█▊ | 2/11 [02:29<11:15, 75.00s/it]
|
||||||
|
27%|██▋ | 3/11 [03:44<09:59, 74.99s/it]
|
||||||
|
36%|███▋ | 4/11 [04:59<08:43, 74.74s/it]
|
||||||
|
45%|████▌ | 5/11 [06:13<07:26, 74.48s/it]
|
||||||
|
55%|█████▍ | 6/11 [07:27<06:12, 74.56s/it]
|
||||||
|
64%|██████▎ | 7/11 [08:42<04:57, 74.46s/it]
|
||||||
|
73%|███████▎ | 8/11 [09:56<03:43, 74.48s/it]
|
||||||
|
82%|████████▏ | 9/11 [11:10<02:28, 74.32s/it]
|
||||||
|
91%|█████████ | 10/11 [12:23<01:13, 73.94s/it]
|
||||||
|
100%|██████████| 11/11 [13:36<00:00, 73.64s/it]
|
||||||
|
100%|██████████| 11/11 [13:36<00:00, 74.25s/it]
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 1: generating actions ...
|
||||||
|
>>> Step 1: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 2: generating actions ...
|
||||||
|
>>> Step 2: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 3: generating actions ...
|
||||||
|
>>> Step 3: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 4: generating actions ...
|
||||||
|
>>> Step 4: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 5: generating actions ...
|
||||||
|
>>> Step 5: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 6: generating actions ...
|
||||||
|
>>> Step 6: interacting with world model ...
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
DEBUG:PIL.Image:Importing PsdImagePlugin
|
>>> Step 7: generating actions ...
|
||||||
DEBUG:PIL.Image:Importing QoiImagePlugin
|
>>> Step 7: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing SgiImagePlugin
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
{
|
{
|
||||||
"gt_video": "unitree_g1_pack_camera/case4/unitree_g1_pack_camera_case4.mp4",
|
"gt_video": "unitree_g1_pack_camera/case4/unitree_g1_pack_camera_case4.mp4",
|
||||||
"pred_video": "unitree_g1_pack_camera/case4/output/inference/200_full_fs6.mp4",
|
"pred_video": "unitree_g1_pack_camera/case4/output/inference/200_full_fs6.mp4",
|
||||||
"psnr": 35.62512454155058
|
"psnr": 31.43390896360405
|
||||||
}
|
}
|
||||||
@@ -20,6 +20,5 @@ dataset="unitree_g1_pack_camera"
|
|||||||
--n_iter 11 \
|
--n_iter 11 \
|
||||||
--timestep_spacing 'uniform_trailing' \
|
--timestep_spacing 'uniform_trailing' \
|
||||||
--guidance_rescale 0.7 \
|
--guidance_rescale 0.7 \
|
||||||
--perframe_ae \
|
--perframe_ae
|
||||||
--fast_policy_no_decode
|
|
||||||
} 2>&1 | tee "${res_dir}/output.log"
|
} 2>&1 | tee "${res_dir}/output.log"
|
||||||
|
|||||||
@@ -1,16 +1,21 @@
|
|||||||
2026-02-11 17:56:01.170137: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
2026-02-11 20:49:47.965949: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
||||||
2026-02-11 17:56:01.219541: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
2026-02-11 20:49:48.015942: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
||||||
2026-02-11 17:56:01.219584: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
2026-02-11 20:49:48.015997: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
||||||
2026-02-11 17:56:01.220897: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
2026-02-11 20:49:48.017330: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
||||||
2026-02-11 17:56:01.228350: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
2026-02-11 20:49:48.024854: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
||||||
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
||||||
2026-02-11 17:56:02.145344: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
2026-02-11 20:49:48.943205: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
||||||
Global seed set to 123
|
Global seed set to 123
|
||||||
>>> Loading prepared model from ckpts/unifolm_wma_dual.ckpt.prepared.pt ...
|
INFO:mainlogger:LatentVisualDiffusion: Running in v-prediction mode
|
||||||
>>> Prepared model loaded.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Diffusion backbone (model.model) converted to FP16.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Projectors (image_proj_model, state_projector, action_projector) converted to FP16.
|
AE working on z of shape (1, 4, 32, 32) = 4096 dimensions.
|
||||||
>>> Encoders (cond_stage_model, embedder) converted to FP16.
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
>>> model checkpoint loaded.
|
||||||
|
>>> Load pre-trained model ...
|
||||||
INFO:root:***** Configing Data *****
|
INFO:root:***** Configing Data *****
|
||||||
>>> unitree_z1_stackbox: 1 data samples loaded.
|
>>> unitree_z1_stackbox: 1 data samples loaded.
|
||||||
>>> unitree_z1_stackbox: data stats loaded.
|
>>> unitree_z1_stackbox: data stats loaded.
|
||||||
@@ -28,37 +33,13 @@ INFO:root:***** Configing Data *****
|
|||||||
>>> unitree_g1_pack_camera: data stats loaded.
|
>>> unitree_g1_pack_camera: data stats loaded.
|
||||||
>>> unitree_g1_pack_camera: normalizer initiated.
|
>>> unitree_g1_pack_camera: normalizer initiated.
|
||||||
>>> Dataset is successfully loaded ...
|
>>> Dataset is successfully loaded ...
|
||||||
✓ KV fused: 66 attention layers
|
|
||||||
>>> Generate 16 frames under each generation ...
|
>>> Generate 16 frames under each generation ...
|
||||||
DEBUG:h5py._conv:Creating converter from 3 to 5
|
DEBUG:h5py._conv:Creating converter from 3 to 5
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
||||||
|
|
||||||
0%| | 0/8 [00:00<?, ?it/s]
|
|
||||||
0%| | 0/8 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
0%| | 0/8 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
||||||
25%|██▌ | 2/8 [00:46<02:17, 22.96s/it]
|
|
||||||
38%|███▊ | 3/8 [01:08<01:54, 22.88s/it]
|
|
||||||
50%|█████ | 4/8 [01:31<01:31, 22.82s/it]
|
|
||||||
62%|██████▎ | 5/8 [01:54<01:08, 22.78s/it]
|
|
||||||
75%|███████▌ | 6/8 [02:16<00:45, 22.76s/it]
|
|
||||||
88%|████████▊ | 7/8 [02:39<00:22, 22.73s/it]
|
|
||||||
100%|██████████| 8/8 [03:02<00:00, 22.72s/it]
|
|
||||||
100%|██████████| 8/8 [03:02<00:00, 22.79s/it]
|
|
||||||
>>> Step 0: generating actions ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 1: generating actions ...
|
|
||||||
>>> Step 1: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 2: generating actions ...
|
|
||||||
>>> Step 2: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 3: generating actions ...
|
|
||||||
>>> Step 3: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 4: generating actions ...
|
|
||||||
>>> Step 4: interacting with world model ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
>>> Step 0: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing BlpImagePlugin
|
DEBUG:PIL.Image:Importing BlpImagePlugin
|
||||||
DEBUG:PIL.Image:Importing BmpImagePlugin
|
DEBUG:PIL.Image:Importing BmpImagePlugin
|
||||||
@@ -108,7 +89,30 @@ DEBUG:PIL.Image:Importing WmfImagePlugin
|
|||||||
DEBUG:PIL.Image:Importing WmfImagePlugin
|
DEBUG:PIL.Image:Importing WmfImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XbmImagePlugin
|
DEBUG:PIL.Image:Importing XbmImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XpmImagePlugin
|
DEBUG:PIL.Image:Importing XpmImagePlugin
|
||||||
|
DEBUG:PIL.Image:Importing XVThumbImagePlugin
|
||||||
|
|
||||||
|
12%|█▎ | 1/8 [01:15<08:48, 75.51s/it]
|
||||||
|
25%|██▌ | 2/8 [02:30<07:32, 75.39s/it]
|
||||||
|
38%|███▊ | 3/8 [03:46<06:16, 75.35s/it]
|
||||||
|
50%|█████ | 4/8 [05:00<05:00, 75.01s/it]
|
||||||
|
62%|██████▎ | 5/8 [06:14<03:44, 74.68s/it]
|
||||||
|
75%|███████▌ | 6/8 [07:28<02:28, 74.40s/it]
|
||||||
|
88%|████████▊ | 7/8 [08:42<01:14, 74.19s/it]
|
||||||
|
100%|██████████| 8/8 [09:55<00:00, 73.95s/it]
|
||||||
|
100%|██████████| 8/8 [09:55<00:00, 74.47s/it]
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 1: generating actions ...
|
||||||
|
>>> Step 1: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 2: generating actions ...
|
||||||
|
>>> Step 2: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 3: generating actions ...
|
||||||
|
>>> Step 3: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 4: generating actions ...
|
||||||
|
>>> Step 4: interacting with world model ...
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
DEBUG:PIL.Image:Importing SpiderImagePlugin
|
>>> Step 5: generating actions ...
|
||||||
DEBUG:PIL.Image:Importing SunImagePlugin
|
>>> Step 5: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing TgaImagePlugin
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
{
|
{
|
||||||
"gt_video": "unitree_z1_dual_arm_cleanup_pencils/case1/unitree_z1_dual_arm_cleanup_pencils_case1.mp4",
|
"gt_video": "unitree_z1_dual_arm_cleanup_pencils/case1/unitree_z1_dual_arm_cleanup_pencils_case1.mp4",
|
||||||
"pred_video": "unitree_z1_dual_arm_cleanup_pencils/case1/output/inference/0_full_fs4.mp4",
|
"pred_video": "unitree_z1_dual_arm_cleanup_pencils/case1/output/inference/0_full_fs4.mp4",
|
||||||
"psnr": 38.269577028444445
|
"psnr": 47.911564449209735
|
||||||
}
|
}
|
||||||
@@ -20,6 +20,5 @@ dataset="unitree_z1_dual_arm_cleanup_pencils"
|
|||||||
--n_iter 8 \
|
--n_iter 8 \
|
||||||
--timestep_spacing 'uniform_trailing' \
|
--timestep_spacing 'uniform_trailing' \
|
||||||
--guidance_rescale 0.7 \
|
--guidance_rescale 0.7 \
|
||||||
--perframe_ae \
|
--perframe_ae
|
||||||
--fast_policy_no_decode
|
|
||||||
} 2>&1 | tee "${res_dir}/output.log"
|
} 2>&1 | tee "${res_dir}/output.log"
|
||||||
|
|||||||
@@ -1,16 +1,21 @@
|
|||||||
2026-02-11 17:59:40.132715: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
2026-02-11 21:01:19.535243: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
||||||
2026-02-11 17:59:40.183410: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
2026-02-11 21:01:19.585230: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
||||||
2026-02-11 17:59:40.183456: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
2026-02-11 21:01:19.585275: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
||||||
2026-02-11 17:59:40.184784: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
2026-02-11 21:01:19.586600: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
||||||
2026-02-11 17:59:40.192307: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
2026-02-11 21:01:19.594107: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
||||||
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
||||||
2026-02-11 17:59:41.105025: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
2026-02-11 21:01:20.510688: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
||||||
Global seed set to 123
|
Global seed set to 123
|
||||||
>>> Loading prepared model from ckpts/unifolm_wma_dual.ckpt.prepared.pt ...
|
INFO:mainlogger:LatentVisualDiffusion: Running in v-prediction mode
|
||||||
>>> Prepared model loaded.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Diffusion backbone (model.model) converted to FP16.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Projectors (image_proj_model, state_projector, action_projector) converted to FP16.
|
AE working on z of shape (1, 4, 32, 32) = 4096 dimensions.
|
||||||
>>> Encoders (cond_stage_model, embedder) converted to FP16.
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
>>> model checkpoint loaded.
|
||||||
|
>>> Load pre-trained model ...
|
||||||
INFO:root:***** Configing Data *****
|
INFO:root:***** Configing Data *****
|
||||||
>>> unitree_z1_stackbox: 1 data samples loaded.
|
>>> unitree_z1_stackbox: 1 data samples loaded.
|
||||||
>>> unitree_z1_stackbox: data stats loaded.
|
>>> unitree_z1_stackbox: data stats loaded.
|
||||||
@@ -28,37 +33,13 @@ INFO:root:***** Configing Data *****
|
|||||||
>>> unitree_g1_pack_camera: data stats loaded.
|
>>> unitree_g1_pack_camera: data stats loaded.
|
||||||
>>> unitree_g1_pack_camera: normalizer initiated.
|
>>> unitree_g1_pack_camera: normalizer initiated.
|
||||||
>>> Dataset is successfully loaded ...
|
>>> Dataset is successfully loaded ...
|
||||||
✓ KV fused: 66 attention layers
|
|
||||||
>>> Generate 16 frames under each generation ...
|
>>> Generate 16 frames under each generation ...
|
||||||
DEBUG:h5py._conv:Creating converter from 3 to 5
|
DEBUG:h5py._conv:Creating converter from 3 to 5
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
||||||
|
|
||||||
0%| | 0/8 [00:00<?, ?it/s]
|
|
||||||
0%| | 0/8 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
0%| | 0/8 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
||||||
25%|██▌ | 2/8 [00:46<02:18, 23.00s/it]
|
|
||||||
38%|███▊ | 3/8 [01:08<01:54, 22.94s/it]
|
|
||||||
50%|█████ | 4/8 [01:31<01:31, 22.86s/it]
|
|
||||||
62%|██████▎ | 5/8 [01:54<01:08, 22.82s/it]
|
|
||||||
75%|███████▌ | 6/8 [02:17<00:45, 22.78s/it]
|
|
||||||
88%|████████▊ | 7/8 [02:39<00:22, 22.77s/it]
|
|
||||||
100%|██████████| 8/8 [03:02<00:00, 22.75s/it]
|
|
||||||
100%|██████████| 8/8 [03:02<00:00, 22.83s/it]
|
|
||||||
>>> Step 0: generating actions ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 1: generating actions ...
|
|
||||||
>>> Step 1: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 2: generating actions ...
|
|
||||||
>>> Step 2: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 3: generating actions ...
|
|
||||||
>>> Step 3: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 4: generating actions ...
|
|
||||||
>>> Step 4: interacting with world model ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
>>> Step 0: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing BlpImagePlugin
|
DEBUG:PIL.Image:Importing BlpImagePlugin
|
||||||
DEBUG:PIL.Image:Importing BmpImagePlugin
|
DEBUG:PIL.Image:Importing BmpImagePlugin
|
||||||
@@ -108,7 +89,30 @@ DEBUG:PIL.Image:Importing WmfImagePlugin
|
|||||||
DEBUG:PIL.Image:Importing WmfImagePlugin
|
DEBUG:PIL.Image:Importing WmfImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XbmImagePlugin
|
DEBUG:PIL.Image:Importing XbmImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XpmImagePlugin
|
DEBUG:PIL.Image:Importing XpmImagePlugin
|
||||||
|
DEBUG:PIL.Image:Importing XVThumbImagePlugin
|
||||||
|
|
||||||
|
12%|█▎ | 1/8 [01:16<08:54, 76.34s/it]
|
||||||
|
25%|██▌ | 2/8 [02:32<07:37, 76.28s/it]
|
||||||
|
38%|███▊ | 3/8 [03:48<06:21, 76.24s/it]
|
||||||
|
50%|█████ | 4/8 [05:04<05:04, 76.15s/it]
|
||||||
|
62%|██████▎ | 5/8 [06:21<03:48, 76.24s/it]
|
||||||
|
75%|███████▌ | 6/8 [07:36<02:32, 76.08s/it]
|
||||||
|
88%|████████▊ | 7/8 [08:52<01:15, 75.93s/it]
|
||||||
|
100%|██████████| 8/8 [10:09<00:00, 76.12s/it]
|
||||||
|
100%|██████████| 8/8 [10:09<00:00, 76.14s/it]
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 1: generating actions ...
|
||||||
|
>>> Step 1: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 2: generating actions ...
|
||||||
|
>>> Step 2: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 3: generating actions ...
|
||||||
|
>>> Step 3: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 4: generating actions ...
|
||||||
|
>>> Step 4: interacting with world model ...
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
DEBUG:PIL.Image:Importing SpiderImagePlugin
|
>>> Step 5: generating actions ...
|
||||||
DEBUG:PIL.Image:Importing SunImagePlugin
|
>>> Step 5: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing TgaImagePlugin
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
{
|
{
|
||||||
"gt_video": "unitree_z1_dual_arm_cleanup_pencils/case2/unitree_z1_dual_arm_cleanup_pencils_case2.mp4",
|
"gt_video": "unitree_z1_dual_arm_cleanup_pencils/case2/unitree_z1_dual_arm_cleanup_pencils_case2.mp4",
|
||||||
"pred_video": "unitree_z1_dual_arm_cleanup_pencils/case2/output/inference/50_full_fs4.mp4",
|
"pred_video": "unitree_z1_dual_arm_cleanup_pencils/case2/output/inference/50_full_fs4.mp4",
|
||||||
"psnr": 44.38754096950435
|
"psnr": 48.344571927558974
|
||||||
}
|
}
|
||||||
@@ -20,6 +20,5 @@ dataset="unitree_z1_dual_arm_cleanup_pencils"
|
|||||||
--n_iter 8 \
|
--n_iter 8 \
|
||||||
--timestep_spacing 'uniform_trailing' \
|
--timestep_spacing 'uniform_trailing' \
|
||||||
--guidance_rescale 0.7 \
|
--guidance_rescale 0.7 \
|
||||||
--perframe_ae \
|
--perframe_ae
|
||||||
--fast_policy_no_decode
|
|
||||||
} 2>&1 | tee "${res_dir}/output.log"
|
} 2>&1 | tee "${res_dir}/output.log"
|
||||||
|
|||||||
@@ -1,16 +1,21 @@
|
|||||||
2026-02-11 18:03:19.373691: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
2026-02-11 21:13:04.812376: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
||||||
2026-02-11 18:03:19.423144: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
2026-02-11 21:13:04.862167: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
||||||
2026-02-11 18:03:19.423201: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
2026-02-11 21:13:04.862223: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
||||||
2026-02-11 18:03:19.424504: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
2026-02-11 21:13:04.863549: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
||||||
2026-02-11 18:03:19.431968: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
2026-02-11 21:13:04.871078: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
||||||
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
||||||
2026-02-11 18:03:20.342432: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
2026-02-11 21:13:05.785070: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
||||||
Global seed set to 123
|
Global seed set to 123
|
||||||
>>> Loading prepared model from ckpts/unifolm_wma_dual.ckpt.prepared.pt ...
|
INFO:mainlogger:LatentVisualDiffusion: Running in v-prediction mode
|
||||||
>>> Prepared model loaded.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Diffusion backbone (model.model) converted to FP16.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Projectors (image_proj_model, state_projector, action_projector) converted to FP16.
|
AE working on z of shape (1, 4, 32, 32) = 4096 dimensions.
|
||||||
>>> Encoders (cond_stage_model, embedder) converted to FP16.
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
>>> model checkpoint loaded.
|
||||||
|
>>> Load pre-trained model ...
|
||||||
INFO:root:***** Configing Data *****
|
INFO:root:***** Configing Data *****
|
||||||
>>> unitree_z1_stackbox: 1 data samples loaded.
|
>>> unitree_z1_stackbox: 1 data samples loaded.
|
||||||
>>> unitree_z1_stackbox: data stats loaded.
|
>>> unitree_z1_stackbox: data stats loaded.
|
||||||
@@ -28,37 +33,13 @@ INFO:root:***** Configing Data *****
|
|||||||
>>> unitree_g1_pack_camera: data stats loaded.
|
>>> unitree_g1_pack_camera: data stats loaded.
|
||||||
>>> unitree_g1_pack_camera: normalizer initiated.
|
>>> unitree_g1_pack_camera: normalizer initiated.
|
||||||
>>> Dataset is successfully loaded ...
|
>>> Dataset is successfully loaded ...
|
||||||
✓ KV fused: 66 attention layers
|
|
||||||
>>> Generate 16 frames under each generation ...
|
>>> Generate 16 frames under each generation ...
|
||||||
DEBUG:h5py._conv:Creating converter from 3 to 5
|
DEBUG:h5py._conv:Creating converter from 3 to 5
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
||||||
|
|
||||||
0%| | 0/8 [00:00<?, ?it/s]
|
|
||||||
0%| | 0/8 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
0%| | 0/8 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
||||||
25%|██▌ | 2/8 [00:46<02:17, 22.99s/it]
|
|
||||||
38%|███▊ | 3/8 [01:09<01:54, 22.94s/it]
|
|
||||||
50%|█████ | 4/8 [01:31<01:31, 22.89s/it]
|
|
||||||
62%|██████▎ | 5/8 [01:54<01:08, 22.84s/it]
|
|
||||||
75%|███████▌ | 6/8 [02:17<00:45, 22.82s/it]
|
|
||||||
88%|████████▊ | 7/8 [02:40<00:22, 22.81s/it]
|
|
||||||
100%|██████████| 8/8 [03:02<00:00, 22.79s/it]
|
|
||||||
100%|██████████| 8/8 [03:02<00:00, 22.86s/it]
|
|
||||||
>>> Step 0: generating actions ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 1: generating actions ...
|
|
||||||
>>> Step 1: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 2: generating actions ...
|
|
||||||
>>> Step 2: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 3: generating actions ...
|
|
||||||
>>> Step 3: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 4: generating actions ...
|
|
||||||
>>> Step 4: interacting with world model ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
>>> Step 0: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing BlpImagePlugin
|
DEBUG:PIL.Image:Importing BlpImagePlugin
|
||||||
DEBUG:PIL.Image:Importing BmpImagePlugin
|
DEBUG:PIL.Image:Importing BmpImagePlugin
|
||||||
@@ -108,7 +89,30 @@ DEBUG:PIL.Image:Importing WmfImagePlugin
|
|||||||
DEBUG:PIL.Image:Importing WmfImagePlugin
|
DEBUG:PIL.Image:Importing WmfImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XbmImagePlugin
|
DEBUG:PIL.Image:Importing XbmImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XpmImagePlugin
|
DEBUG:PIL.Image:Importing XpmImagePlugin
|
||||||
|
DEBUG:PIL.Image:Importing XVThumbImagePlugin
|
||||||
|
|
||||||
|
12%|█▎ | 1/8 [01:15<08:45, 75.11s/it]
|
||||||
|
25%|██▌ | 2/8 [02:30<07:31, 75.30s/it]
|
||||||
|
38%|███▊ | 3/8 [03:45<06:16, 75.32s/it]
|
||||||
|
50%|█████ | 4/8 [05:01<05:01, 75.29s/it]
|
||||||
|
62%|██████▎ | 5/8 [06:16<03:46, 75.38s/it]
|
||||||
|
75%|███████▌ | 6/8 [07:32<02:30, 75.48s/it]
|
||||||
|
88%|████████▊ | 7/8 [08:47<01:15, 75.39s/it]
|
||||||
|
100%|██████████| 8/8 [10:02<00:00, 75.30s/it]
|
||||||
|
100%|██████████| 8/8 [10:02<00:00, 75.33s/it]
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 1: generating actions ...
|
||||||
|
>>> Step 1: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 2: generating actions ...
|
||||||
|
>>> Step 2: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 3: generating actions ...
|
||||||
|
>>> Step 3: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 4: generating actions ...
|
||||||
|
>>> Step 4: interacting with world model ...
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
DEBUG:PIL.Image:Importing SpiderImagePlugin
|
>>> Step 5: generating actions ...
|
||||||
DEBUG:PIL.Image:Importing SunImagePlugin
|
>>> Step 5: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing TgaImagePlugin
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
{
|
{
|
||||||
"gt_video": "unitree_z1_dual_arm_cleanup_pencils/case3/unitree_z1_dual_arm_cleanup_pencils_case3.mp4",
|
"gt_video": "unitree_z1_dual_arm_cleanup_pencils/case3/unitree_z1_dual_arm_cleanup_pencils_case3.mp4",
|
||||||
"pred_video": "unitree_z1_dual_arm_cleanup_pencils/case3/output/inference/100_full_fs4.mp4",
|
"pred_video": "unitree_z1_dual_arm_cleanup_pencils/case3/output/inference/100_full_fs4.mp4",
|
||||||
"psnr": 32.29959078097713
|
"psnr": 41.152374490134825
|
||||||
}
|
}
|
||||||
@@ -20,6 +20,5 @@ dataset="unitree_z1_dual_arm_cleanup_pencils"
|
|||||||
--n_iter 8 \
|
--n_iter 8 \
|
||||||
--timestep_spacing 'uniform_trailing' \
|
--timestep_spacing 'uniform_trailing' \
|
||||||
--guidance_rescale 0.7 \
|
--guidance_rescale 0.7 \
|
||||||
--perframe_ae \
|
--perframe_ae
|
||||||
--fast_policy_no_decode
|
|
||||||
} 2>&1 | tee "${res_dir}/output.log"
|
} 2>&1 | tee "${res_dir}/output.log"
|
||||||
|
|||||||
@@ -1,16 +1,21 @@
|
|||||||
2026-02-11 18:06:58.863806: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
2026-02-11 21:24:42.443699: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
||||||
2026-02-11 18:06:58.913518: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
2026-02-11 21:24:42.494143: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
||||||
2026-02-11 18:06:58.913565: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
2026-02-11 21:24:42.494201: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
||||||
2026-02-11 18:06:58.914918: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
2026-02-11 21:24:42.495506: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
||||||
2026-02-11 18:06:58.922497: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
2026-02-11 21:24:42.503003: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
||||||
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
||||||
2026-02-11 18:06:59.840461: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
2026-02-11 21:24:43.415898: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
||||||
Global seed set to 123
|
Global seed set to 123
|
||||||
>>> Loading prepared model from ckpts/unifolm_wma_dual.ckpt.prepared.pt ...
|
INFO:mainlogger:LatentVisualDiffusion: Running in v-prediction mode
|
||||||
>>> Prepared model loaded.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Diffusion backbone (model.model) converted to FP16.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Projectors (image_proj_model, state_projector, action_projector) converted to FP16.
|
AE working on z of shape (1, 4, 32, 32) = 4096 dimensions.
|
||||||
>>> Encoders (cond_stage_model, embedder) converted to FP16.
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
>>> model checkpoint loaded.
|
||||||
|
>>> Load pre-trained model ...
|
||||||
INFO:root:***** Configing Data *****
|
INFO:root:***** Configing Data *****
|
||||||
>>> unitree_z1_stackbox: 1 data samples loaded.
|
>>> unitree_z1_stackbox: 1 data samples loaded.
|
||||||
>>> unitree_z1_stackbox: data stats loaded.
|
>>> unitree_z1_stackbox: data stats loaded.
|
||||||
@@ -28,37 +33,13 @@ INFO:root:***** Configing Data *****
|
|||||||
>>> unitree_g1_pack_camera: data stats loaded.
|
>>> unitree_g1_pack_camera: data stats loaded.
|
||||||
>>> unitree_g1_pack_camera: normalizer initiated.
|
>>> unitree_g1_pack_camera: normalizer initiated.
|
||||||
>>> Dataset is successfully loaded ...
|
>>> Dataset is successfully loaded ...
|
||||||
✓ KV fused: 66 attention layers
|
|
||||||
>>> Generate 16 frames under each generation ...
|
>>> Generate 16 frames under each generation ...
|
||||||
DEBUG:h5py._conv:Creating converter from 3 to 5
|
DEBUG:h5py._conv:Creating converter from 3 to 5
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
||||||
|
|
||||||
0%| | 0/8 [00:00<?, ?it/s]
|
|
||||||
0%| | 0/8 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
0%| | 0/8 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
||||||
25%|██▌ | 2/8 [00:46<02:18, 23.01s/it]
|
|
||||||
38%|███▊ | 3/8 [01:09<01:54, 22.94s/it]
|
|
||||||
50%|█████ | 4/8 [01:31<01:31, 22.89s/it]
|
|
||||||
62%|██████▎ | 5/8 [01:54<01:08, 22.85s/it]
|
|
||||||
75%|███████▌ | 6/8 [02:17<00:45, 22.81s/it]
|
|
||||||
88%|████████▊ | 7/8 [02:40<00:22, 22.79s/it]
|
|
||||||
100%|██████████| 8/8 [03:02<00:00, 22.77s/it]
|
|
||||||
100%|██████████| 8/8 [03:02<00:00, 22.85s/it]
|
|
||||||
>>> Step 0: generating actions ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 1: generating actions ...
|
|
||||||
>>> Step 1: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 2: generating actions ...
|
|
||||||
>>> Step 2: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 3: generating actions ...
|
|
||||||
>>> Step 3: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 4: generating actions ...
|
|
||||||
>>> Step 4: interacting with world model ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
>>> Step 0: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing BlpImagePlugin
|
DEBUG:PIL.Image:Importing BlpImagePlugin
|
||||||
DEBUG:PIL.Image:Importing BmpImagePlugin
|
DEBUG:PIL.Image:Importing BmpImagePlugin
|
||||||
@@ -108,7 +89,30 @@ DEBUG:PIL.Image:Importing WmfImagePlugin
|
|||||||
DEBUG:PIL.Image:Importing WmfImagePlugin
|
DEBUG:PIL.Image:Importing WmfImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XbmImagePlugin
|
DEBUG:PIL.Image:Importing XbmImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XpmImagePlugin
|
DEBUG:PIL.Image:Importing XpmImagePlugin
|
||||||
|
DEBUG:PIL.Image:Importing XVThumbImagePlugin
|
||||||
|
|
||||||
|
12%|█▎ | 1/8 [01:15<08:46, 75.28s/it]
|
||||||
|
25%|██▌ | 2/8 [02:30<07:32, 75.34s/it]
|
||||||
|
38%|███▊ | 3/8 [03:45<06:15, 75.08s/it]
|
||||||
|
50%|█████ | 4/8 [04:59<04:58, 74.69s/it]
|
||||||
|
62%|██████▎ | 5/8 [06:13<03:43, 74.43s/it]
|
||||||
|
75%|███████▌ | 6/8 [07:27<02:28, 74.27s/it]
|
||||||
|
88%|████████▊ | 7/8 [08:41<01:14, 74.21s/it]
|
||||||
|
100%|██████████| 8/8 [09:55<00:00, 74.13s/it]
|
||||||
|
100%|██████████| 8/8 [09:55<00:00, 74.43s/it]
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 1: generating actions ...
|
||||||
|
>>> Step 1: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 2: generating actions ...
|
||||||
|
>>> Step 2: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 3: generating actions ...
|
||||||
|
>>> Step 3: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 4: generating actions ...
|
||||||
|
>>> Step 4: interacting with world model ...
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
DEBUG:PIL.Image:Importing SpiderImagePlugin
|
>>> Step 5: generating actions ...
|
||||||
DEBUG:PIL.Image:Importing SunImagePlugin
|
>>> Step 5: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing TgaImagePlugin
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
{
|
{
|
||||||
"gt_video": "unitree_z1_dual_arm_cleanup_pencils/case4/unitree_z1_dual_arm_cleanup_pencils_case4.mp4",
|
"gt_video": "unitree_z1_dual_arm_cleanup_pencils/case4/unitree_z1_dual_arm_cleanup_pencils_case4.mp4",
|
||||||
"pred_video": "unitree_z1_dual_arm_cleanup_pencils/case4/output/inference/200_full_fs4.mp4",
|
"pred_video": "unitree_z1_dual_arm_cleanup_pencils/case4/output/inference/200_full_fs4.mp4",
|
||||||
"psnr": 45.051241961122535
|
"psnr": 46.025723557253855
|
||||||
}
|
}
|
||||||
@@ -20,6 +20,5 @@ dataset="unitree_z1_dual_arm_cleanup_pencils"
|
|||||||
--n_iter 8 \
|
--n_iter 8 \
|
||||||
--timestep_spacing 'uniform_trailing' \
|
--timestep_spacing 'uniform_trailing' \
|
||||||
--guidance_rescale 0.7 \
|
--guidance_rescale 0.7 \
|
||||||
--perframe_ae \
|
--perframe_ae
|
||||||
--fast_policy_no_decode
|
|
||||||
} 2>&1 | tee "${res_dir}/output.log"
|
} 2>&1 | tee "${res_dir}/output.log"
|
||||||
|
|||||||
@@ -1,16 +1,21 @@
|
|||||||
2026-02-11 18:10:38.361867: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
2026-02-11 21:36:14.761055: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
||||||
2026-02-11 18:10:38.412126: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
2026-02-11 21:36:14.811056: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
||||||
2026-02-11 18:10:38.412182: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
2026-02-11 21:36:14.811115: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
||||||
2026-02-11 18:10:38.413493: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
2026-02-11 21:36:14.812480: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
||||||
2026-02-11 18:10:38.420963: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
2026-02-11 21:36:14.820115: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
||||||
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
||||||
2026-02-11 18:10:39.335981: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
2026-02-11 21:36:15.736583: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
||||||
Global seed set to 123
|
Global seed set to 123
|
||||||
>>> Loading prepared model from ckpts/unifolm_wma_dual.ckpt.prepared.pt ...
|
INFO:mainlogger:LatentVisualDiffusion: Running in v-prediction mode
|
||||||
>>> Prepared model loaded.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Diffusion backbone (model.model) converted to FP16.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Projectors (image_proj_model, state_projector, action_projector) converted to FP16.
|
AE working on z of shape (1, 4, 32, 32) = 4096 dimensions.
|
||||||
>>> Encoders (cond_stage_model, embedder) converted to FP16.
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
>>> model checkpoint loaded.
|
||||||
|
>>> Load pre-trained model ...
|
||||||
INFO:root:***** Configing Data *****
|
INFO:root:***** Configing Data *****
|
||||||
>>> unitree_z1_stackbox: 1 data samples loaded.
|
>>> unitree_z1_stackbox: 1 data samples loaded.
|
||||||
>>> unitree_z1_stackbox: data stats loaded.
|
>>> unitree_z1_stackbox: data stats loaded.
|
||||||
@@ -28,34 +33,13 @@ INFO:root:***** Configing Data *****
|
|||||||
>>> unitree_g1_pack_camera: data stats loaded.
|
>>> unitree_g1_pack_camera: data stats loaded.
|
||||||
>>> unitree_g1_pack_camera: normalizer initiated.
|
>>> unitree_g1_pack_camera: normalizer initiated.
|
||||||
>>> Dataset is successfully loaded ...
|
>>> Dataset is successfully loaded ...
|
||||||
✓ KV fused: 66 attention layers
|
|
||||||
>>> Generate 16 frames under each generation ...
|
>>> Generate 16 frames under each generation ...
|
||||||
DEBUG:h5py._conv:Creating converter from 3 to 5
|
DEBUG:h5py._conv:Creating converter from 3 to 5
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
||||||
|
|
||||||
0%| | 0/7 [00:00<?, ?it/s]
|
|
||||||
0%| | 0/7 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
0%| | 0/7 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
||||||
29%|██▊ | 2/7 [00:46<01:54, 22.99s/it]
|
|
||||||
43%|████▎ | 3/7 [01:08<01:31, 22.92s/it]
|
|
||||||
57%|█████▋ | 4/7 [01:31<01:08, 22.88s/it]
|
|
||||||
71%|███████▏ | 5/7 [01:54<00:45, 22.82s/it]
|
|
||||||
86%|████████▌ | 6/7 [02:17<00:22, 22.79s/it]
|
|
||||||
100%|██████████| 7/7 [02:39<00:00, 22.75s/it]
|
|
||||||
100%|██████████| 7/7 [02:39<00:00, 22.84s/it]
|
|
||||||
>>> Step 0: generating actions ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 1: generating actions ...
|
|
||||||
>>> Step 1: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 2: generating actions ...
|
|
||||||
>>> Step 2: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 3: generating actions ...
|
|
||||||
>>> Step 3: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 0: interacting with world model ...
|
>>> Step 0: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing BlpImagePlugin
|
DEBUG:PIL.Image:Importing BlpImagePlugin
|
||||||
DEBUG:PIL.Image:Importing BmpImagePlugin
|
DEBUG:PIL.Image:Importing BmpImagePlugin
|
||||||
@@ -105,7 +89,27 @@ DEBUG:PIL.Image:Importing WmfImagePlugin
|
|||||||
DEBUG:PIL.Image:Importing WmfImagePlugin
|
DEBUG:PIL.Image:Importing WmfImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XbmImagePlugin
|
DEBUG:PIL.Image:Importing XbmImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XpmImagePlugin
|
DEBUG:PIL.Image:Importing XpmImagePlugin
|
||||||
|
DEBUG:PIL.Image:Importing XVThumbImagePlugin
|
||||||
|
|
||||||
|
14%|█▍ | 1/7 [01:15<07:34, 75.70s/it]
|
||||||
|
29%|██▊ | 2/7 [02:31<06:18, 75.65s/it]
|
||||||
|
43%|████▎ | 3/7 [03:46<05:02, 75.52s/it]
|
||||||
|
57%|█████▋ | 4/7 [05:02<03:46, 75.47s/it]
|
||||||
|
71%|███████▏ | 5/7 [06:17<02:30, 75.40s/it]
|
||||||
|
86%|████████▌ | 6/7 [07:32<01:15, 75.37s/it]
|
||||||
|
100%|██████████| 7/7 [08:48<00:00, 75.38s/it]
|
||||||
|
100%|██████████| 7/7 [08:48<00:00, 75.44s/it]
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 1: generating actions ...
|
||||||
|
>>> Step 1: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 2: generating actions ...
|
||||||
|
>>> Step 2: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 3: generating actions ...
|
||||||
|
>>> Step 3: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
>>> Step 4: generating actions ...
|
>>> Step 4: generating actions ...
|
||||||
DEBUG:PIL.Image:Importing SunImagePlugin
|
>>> Step 4: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing TgaImagePlugin
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
DEBUG:PIL.Image:Importing TiffImagePlugin
|
>>> Step 5: generating actions ...
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
{
|
{
|
||||||
"gt_video": "unitree_z1_dual_arm_stackbox/case1/unitree_z1_dual_arm_stackbox_case1.mp4",
|
"gt_video": "unitree_z1_dual_arm_stackbox/case1/unitree_z1_dual_arm_stackbox_case1.mp4",
|
||||||
"pred_video": "unitree_z1_dual_arm_stackbox/case1/output/inference/5_full_fs4.mp4",
|
"pred_video": "unitree_z1_dual_arm_stackbox/case1/output/inference/5_full_fs4.mp4",
|
||||||
"psnr": 42.717688631296596
|
"psnr": 44.3480149502738
|
||||||
}
|
}
|
||||||
@@ -20,6 +20,5 @@ dataset="unitree_z1_dual_arm_stackbox"
|
|||||||
--n_iter 7 \
|
--n_iter 7 \
|
||||||
--timestep_spacing 'uniform_trailing' \
|
--timestep_spacing 'uniform_trailing' \
|
||||||
--guidance_rescale 0.7 \
|
--guidance_rescale 0.7 \
|
||||||
--perframe_ae \
|
--perframe_ae
|
||||||
--fast_policy_no_decode
|
|
||||||
} 2>&1 | tee "${res_dir}/output.log"
|
} 2>&1 | tee "${res_dir}/output.log"
|
||||||
|
|||||||
@@ -1,16 +1,21 @@
|
|||||||
2026-02-11 18:13:57.132827: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
2026-02-11 21:46:41.375935: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
||||||
2026-02-11 18:13:57.182101: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
2026-02-11 21:46:41.426557: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
||||||
2026-02-11 18:13:57.182156: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
2026-02-11 21:46:41.426614: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
||||||
2026-02-11 18:13:57.183471: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
2026-02-11 21:46:41.427937: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
||||||
2026-02-11 18:13:57.190931: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
2026-02-11 21:46:41.435507: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
||||||
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
||||||
2026-02-11 18:13:58.104923: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
2026-02-11 21:46:42.361310: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
||||||
Global seed set to 123
|
Global seed set to 123
|
||||||
>>> Loading prepared model from ckpts/unifolm_wma_dual.ckpt.prepared.pt ...
|
INFO:mainlogger:LatentVisualDiffusion: Running in v-prediction mode
|
||||||
>>> Prepared model loaded.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Diffusion backbone (model.model) converted to FP16.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Projectors (image_proj_model, state_projector, action_projector) converted to FP16.
|
AE working on z of shape (1, 4, 32, 32) = 4096 dimensions.
|
||||||
>>> Encoders (cond_stage_model, embedder) converted to FP16.
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
>>> model checkpoint loaded.
|
||||||
|
>>> Load pre-trained model ...
|
||||||
INFO:root:***** Configing Data *****
|
INFO:root:***** Configing Data *****
|
||||||
>>> unitree_z1_stackbox: 1 data samples loaded.
|
>>> unitree_z1_stackbox: 1 data samples loaded.
|
||||||
>>> unitree_z1_stackbox: data stats loaded.
|
>>> unitree_z1_stackbox: data stats loaded.
|
||||||
@@ -28,34 +33,13 @@ INFO:root:***** Configing Data *****
|
|||||||
>>> unitree_g1_pack_camera: data stats loaded.
|
>>> unitree_g1_pack_camera: data stats loaded.
|
||||||
>>> unitree_g1_pack_camera: normalizer initiated.
|
>>> unitree_g1_pack_camera: normalizer initiated.
|
||||||
>>> Dataset is successfully loaded ...
|
>>> Dataset is successfully loaded ...
|
||||||
✓ KV fused: 66 attention layers
|
|
||||||
>>> Generate 16 frames under each generation ...
|
>>> Generate 16 frames under each generation ...
|
||||||
DEBUG:h5py._conv:Creating converter from 3 to 5
|
DEBUG:h5py._conv:Creating converter from 3 to 5
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
||||||
|
|
||||||
0%| | 0/7 [00:00<?, ?it/s]
|
|
||||||
0%| | 0/7 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
0%| | 0/7 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
||||||
29%|██▊ | 2/7 [00:46<01:54, 22.98s/it]
|
|
||||||
43%|████▎ | 3/7 [01:08<01:31, 22.91s/it]
|
|
||||||
57%|█████▋ | 4/7 [01:31<01:08, 22.87s/it]
|
|
||||||
71%|███████▏ | 5/7 [01:54<00:45, 22.84s/it]
|
|
||||||
86%|████████▌ | 6/7 [02:17<00:22, 22.80s/it]
|
|
||||||
100%|██████████| 7/7 [02:39<00:00, 22.77s/it]
|
|
||||||
100%|██████████| 7/7 [02:39<00:00, 22.84s/it]
|
|
||||||
>>> Step 0: generating actions ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 1: generating actions ...
|
|
||||||
>>> Step 1: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 2: generating actions ...
|
|
||||||
>>> Step 2: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 3: generating actions ...
|
|
||||||
>>> Step 3: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 0: interacting with world model ...
|
>>> Step 0: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing BlpImagePlugin
|
DEBUG:PIL.Image:Importing BlpImagePlugin
|
||||||
DEBUG:PIL.Image:Importing BmpImagePlugin
|
DEBUG:PIL.Image:Importing BmpImagePlugin
|
||||||
@@ -105,7 +89,27 @@ DEBUG:PIL.Image:Importing WmfImagePlugin
|
|||||||
DEBUG:PIL.Image:Importing WmfImagePlugin
|
DEBUG:PIL.Image:Importing WmfImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XbmImagePlugin
|
DEBUG:PIL.Image:Importing XbmImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XpmImagePlugin
|
DEBUG:PIL.Image:Importing XpmImagePlugin
|
||||||
|
DEBUG:PIL.Image:Importing XVThumbImagePlugin
|
||||||
|
|
||||||
|
14%|█▍ | 1/7 [01:16<07:38, 76.39s/it]
|
||||||
|
29%|██▊ | 2/7 [02:33<06:23, 76.69s/it]
|
||||||
|
43%|████▎ | 3/7 [03:50<05:07, 76.87s/it]
|
||||||
|
57%|█████▋ | 4/7 [05:07<03:50, 76.91s/it]
|
||||||
|
71%|███████▏ | 5/7 [06:23<02:33, 76.80s/it]
|
||||||
|
86%|████████▌ | 6/7 [07:40<01:16, 76.77s/it]
|
||||||
|
100%|██████████| 7/7 [08:57<00:00, 76.85s/it]
|
||||||
|
100%|██████████| 7/7 [08:57<00:00, 76.81s/it]
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 1: generating actions ...
|
||||||
|
>>> Step 1: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 2: generating actions ...
|
||||||
|
>>> Step 2: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 3: generating actions ...
|
||||||
|
>>> Step 3: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
>>> Step 4: generating actions ...
|
>>> Step 4: generating actions ...
|
||||||
DEBUG:PIL.Image:Importing SunImagePlugin
|
>>> Step 4: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing TgaImagePlugin
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
DEBUG:PIL.Image:Importing TiffImagePlugin
|
>>> Step 5: generating actions ...
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
{
|
{
|
||||||
"gt_video": "unitree_z1_dual_arm_stackbox/case2/unitree_z1_dual_arm_stackbox_case2.mp4",
|
"gt_video": "unitree_z1_dual_arm_stackbox/case2/unitree_z1_dual_arm_stackbox_case2.mp4",
|
||||||
"pred_video": "unitree_z1_dual_arm_stackbox/case2/output/inference/15_full_fs4.mp4",
|
"pred_video": "unitree_z1_dual_arm_stackbox/case2/output/inference/15_full_fs4.mp4",
|
||||||
"psnr": 44.90750363879194
|
"psnr": 39.867728254007716
|
||||||
}
|
}
|
||||||
@@ -20,6 +20,5 @@ dataset="unitree_z1_dual_arm_stackbox"
|
|||||||
--n_iter 7 \
|
--n_iter 7 \
|
||||||
--timestep_spacing 'uniform_trailing' \
|
--timestep_spacing 'uniform_trailing' \
|
||||||
--guidance_rescale 0.7 \
|
--guidance_rescale 0.7 \
|
||||||
--perframe_ae \
|
--perframe_ae
|
||||||
--fast_policy_no_decode
|
|
||||||
} 2>&1 | tee "${res_dir}/output.log"
|
} 2>&1 | tee "${res_dir}/output.log"
|
||||||
|
|||||||
@@ -1,16 +1,21 @@
|
|||||||
2026-02-11 18:17:16.023670: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
2026-02-11 21:57:17.623993: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
||||||
2026-02-11 18:17:16.073206: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
2026-02-11 21:57:17.673835: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
||||||
2026-02-11 18:17:16.073251: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
2026-02-11 21:57:17.673891: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
||||||
2026-02-11 18:17:16.074552: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
2026-02-11 21:57:17.675211: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
||||||
2026-02-11 18:17:16.082033: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
2026-02-11 21:57:17.682716: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
||||||
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
||||||
2026-02-11 18:17:16.997362: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
2026-02-11 21:57:18.593525: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
||||||
Global seed set to 123
|
Global seed set to 123
|
||||||
>>> Loading prepared model from ckpts/unifolm_wma_dual.ckpt.prepared.pt ...
|
INFO:mainlogger:LatentVisualDiffusion: Running in v-prediction mode
|
||||||
>>> Prepared model loaded.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Diffusion backbone (model.model) converted to FP16.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Projectors (image_proj_model, state_projector, action_projector) converted to FP16.
|
AE working on z of shape (1, 4, 32, 32) = 4096 dimensions.
|
||||||
>>> Encoders (cond_stage_model, embedder) converted to FP16.
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
>>> model checkpoint loaded.
|
||||||
|
>>> Load pre-trained model ...
|
||||||
INFO:root:***** Configing Data *****
|
INFO:root:***** Configing Data *****
|
||||||
>>> unitree_z1_stackbox: 1 data samples loaded.
|
>>> unitree_z1_stackbox: 1 data samples loaded.
|
||||||
>>> unitree_z1_stackbox: data stats loaded.
|
>>> unitree_z1_stackbox: data stats loaded.
|
||||||
@@ -28,34 +33,13 @@ INFO:root:***** Configing Data *****
|
|||||||
>>> unitree_g1_pack_camera: data stats loaded.
|
>>> unitree_g1_pack_camera: data stats loaded.
|
||||||
>>> unitree_g1_pack_camera: normalizer initiated.
|
>>> unitree_g1_pack_camera: normalizer initiated.
|
||||||
>>> Dataset is successfully loaded ...
|
>>> Dataset is successfully loaded ...
|
||||||
✓ KV fused: 66 attention layers
|
|
||||||
>>> Generate 16 frames under each generation ...
|
>>> Generate 16 frames under each generation ...
|
||||||
DEBUG:h5py._conv:Creating converter from 3 to 5
|
DEBUG:h5py._conv:Creating converter from 3 to 5
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
||||||
|
|
||||||
0%| | 0/7 [00:00<?, ?it/s]
|
|
||||||
0%| | 0/7 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
0%| | 0/7 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
||||||
29%|██▊ | 2/7 [00:46<01:55, 23.03s/it]
|
|
||||||
43%|████▎ | 3/7 [01:09<01:31, 22.95s/it]
|
|
||||||
57%|█████▋ | 4/7 [01:31<01:08, 22.91s/it]
|
|
||||||
71%|███████▏ | 5/7 [01:54<00:45, 22.87s/it]
|
|
||||||
86%|████████▌ | 6/7 [02:17<00:22, 22.84s/it]
|
|
||||||
100%|██████████| 7/7 [02:40<00:00, 22.82s/it]
|
|
||||||
100%|██████████| 7/7 [02:40<00:00, 22.89s/it]
|
|
||||||
>>> Step 0: generating actions ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 1: generating actions ...
|
|
||||||
>>> Step 1: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 2: generating actions ...
|
|
||||||
>>> Step 2: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 3: generating actions ...
|
|
||||||
>>> Step 3: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 0: interacting with world model ...
|
>>> Step 0: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing BlpImagePlugin
|
DEBUG:PIL.Image:Importing BlpImagePlugin
|
||||||
DEBUG:PIL.Image:Importing BmpImagePlugin
|
DEBUG:PIL.Image:Importing BmpImagePlugin
|
||||||
@@ -105,7 +89,27 @@ DEBUG:PIL.Image:Importing WmfImagePlugin
|
|||||||
DEBUG:PIL.Image:Importing WmfImagePlugin
|
DEBUG:PIL.Image:Importing WmfImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XbmImagePlugin
|
DEBUG:PIL.Image:Importing XbmImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XpmImagePlugin
|
DEBUG:PIL.Image:Importing XpmImagePlugin
|
||||||
|
DEBUG:PIL.Image:Importing XVThumbImagePlugin
|
||||||
|
|
||||||
|
14%|█▍ | 1/7 [01:15<07:33, 75.59s/it]
|
||||||
|
29%|██▊ | 2/7 [02:31<06:17, 75.59s/it]
|
||||||
|
43%|████▎ | 3/7 [03:46<05:01, 75.44s/it]
|
||||||
|
57%|█████▋ | 4/7 [05:01<03:46, 75.39s/it]
|
||||||
|
71%|███████▏ | 5/7 [06:17<02:30, 75.35s/it]
|
||||||
|
86%|████████▌ | 6/7 [07:32<01:15, 75.32s/it]
|
||||||
|
100%|██████████| 7/7 [08:47<00:00, 75.24s/it]
|
||||||
|
100%|██████████| 7/7 [08:47<00:00, 75.34s/it]
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 1: generating actions ...
|
||||||
|
>>> Step 1: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 2: generating actions ...
|
||||||
|
>>> Step 2: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 3: generating actions ...
|
||||||
|
>>> Step 3: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
>>> Step 4: generating actions ...
|
>>> Step 4: generating actions ...
|
||||||
DEBUG:PIL.Image:Importing SunImagePlugin
|
>>> Step 4: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing TgaImagePlugin
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
DEBUG:PIL.Image:Importing TiffImagePlugin
|
>>> Step 5: generating actions ...
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
{
|
{
|
||||||
"gt_video": "unitree_z1_dual_arm_stackbox/case3/unitree_z1_dual_arm_stackbox_case3.mp4",
|
"gt_video": "unitree_z1_dual_arm_stackbox/case3/unitree_z1_dual_arm_stackbox_case3.mp4",
|
||||||
"pred_video": "unitree_z1_dual_arm_stackbox/case3/output/inference/25_full_fs4.mp4",
|
"pred_video": "unitree_z1_dual_arm_stackbox/case3/output/inference/25_full_fs4.mp4",
|
||||||
"psnr": 39.63695040491171
|
"psnr": 39.19101039445159
|
||||||
}
|
}
|
||||||
@@ -20,6 +20,5 @@ dataset="unitree_z1_dual_arm_stackbox"
|
|||||||
--n_iter 7 \
|
--n_iter 7 \
|
||||||
--timestep_spacing 'uniform_trailing' \
|
--timestep_spacing 'uniform_trailing' \
|
||||||
--guidance_rescale 0.7 \
|
--guidance_rescale 0.7 \
|
||||||
--perframe_ae \
|
--perframe_ae
|
||||||
--fast_policy_no_decode
|
|
||||||
} 2>&1 | tee "${res_dir}/output.log"
|
} 2>&1 | tee "${res_dir}/output.log"
|
||||||
|
|||||||
@@ -1,16 +1,21 @@
|
|||||||
2026-02-11 18:20:35.210324: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
2026-02-11 22:07:43.398736: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
||||||
2026-02-11 18:20:35.259487: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
2026-02-11 22:07:43.448264: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
||||||
2026-02-11 18:20:35.259530: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
2026-02-11 22:07:43.448321: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
||||||
2026-02-11 18:20:35.260816: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
2026-02-11 22:07:43.449636: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
||||||
2026-02-11 18:20:35.268252: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
2026-02-11 22:07:43.457127: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
||||||
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
||||||
2026-02-11 18:20:36.181189: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
2026-02-11 22:07:44.370935: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
||||||
Global seed set to 123
|
Global seed set to 123
|
||||||
>>> Loading prepared model from ckpts/unifolm_wma_dual.ckpt.prepared.pt ...
|
INFO:mainlogger:LatentVisualDiffusion: Running in v-prediction mode
|
||||||
>>> Prepared model loaded.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Diffusion backbone (model.model) converted to FP16.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Projectors (image_proj_model, state_projector, action_projector) converted to FP16.
|
AE working on z of shape (1, 4, 32, 32) = 4096 dimensions.
|
||||||
>>> Encoders (cond_stage_model, embedder) converted to FP16.
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
>>> model checkpoint loaded.
|
||||||
|
>>> Load pre-trained model ...
|
||||||
INFO:root:***** Configing Data *****
|
INFO:root:***** Configing Data *****
|
||||||
>>> unitree_z1_stackbox: 1 data samples loaded.
|
>>> unitree_z1_stackbox: 1 data samples loaded.
|
||||||
>>> unitree_z1_stackbox: data stats loaded.
|
>>> unitree_z1_stackbox: data stats loaded.
|
||||||
@@ -28,34 +33,13 @@ INFO:root:***** Configing Data *****
|
|||||||
>>> unitree_g1_pack_camera: data stats loaded.
|
>>> unitree_g1_pack_camera: data stats loaded.
|
||||||
>>> unitree_g1_pack_camera: normalizer initiated.
|
>>> unitree_g1_pack_camera: normalizer initiated.
|
||||||
>>> Dataset is successfully loaded ...
|
>>> Dataset is successfully loaded ...
|
||||||
✓ KV fused: 66 attention layers
|
|
||||||
>>> Generate 16 frames under each generation ...
|
>>> Generate 16 frames under each generation ...
|
||||||
DEBUG:h5py._conv:Creating converter from 3 to 5
|
DEBUG:h5py._conv:Creating converter from 3 to 5
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
||||||
|
|
||||||
0%| | 0/7 [00:00<?, ?it/s]
|
|
||||||
0%| | 0/7 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
0%| | 0/7 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
||||||
29%|██▊ | 2/7 [00:46<01:55, 23.03s/it]
|
|
||||||
43%|████▎ | 3/7 [01:09<01:31, 22.96s/it]
|
|
||||||
57%|█████▋ | 4/7 [01:31<01:08, 22.92s/it]
|
|
||||||
71%|███████▏ | 5/7 [01:54<00:45, 22.89s/it]
|
|
||||||
86%|████████▌ | 6/7 [02:17<00:22, 22.86s/it]
|
|
||||||
100%|██████████| 7/7 [02:40<00:00, 22.84s/it]
|
|
||||||
100%|██████████| 7/7 [02:40<00:00, 22.91s/it]
|
|
||||||
>>> Step 0: generating actions ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 1: generating actions ...
|
|
||||||
>>> Step 1: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 2: generating actions ...
|
|
||||||
>>> Step 2: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 3: generating actions ...
|
|
||||||
>>> Step 3: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 0: interacting with world model ...
|
>>> Step 0: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing BlpImagePlugin
|
DEBUG:PIL.Image:Importing BlpImagePlugin
|
||||||
DEBUG:PIL.Image:Importing BmpImagePlugin
|
DEBUG:PIL.Image:Importing BmpImagePlugin
|
||||||
@@ -105,7 +89,27 @@ DEBUG:PIL.Image:Importing WmfImagePlugin
|
|||||||
DEBUG:PIL.Image:Importing WmfImagePlugin
|
DEBUG:PIL.Image:Importing WmfImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XbmImagePlugin
|
DEBUG:PIL.Image:Importing XbmImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XpmImagePlugin
|
DEBUG:PIL.Image:Importing XpmImagePlugin
|
||||||
|
DEBUG:PIL.Image:Importing XVThumbImagePlugin
|
||||||
|
|
||||||
|
14%|█▍ | 1/7 [01:17<07:42, 77.04s/it]
|
||||||
|
29%|██▊ | 2/7 [02:33<06:24, 76.95s/it]
|
||||||
|
43%|████▎ | 3/7 [03:50<05:07, 76.87s/it]
|
||||||
|
57%|█████▋ | 4/7 [05:06<03:49, 76.59s/it]
|
||||||
|
71%|███████▏ | 5/7 [06:24<02:33, 76.82s/it]
|
||||||
|
86%|████████▌ | 6/7 [07:39<01:16, 76.43s/it]
|
||||||
|
100%|██████████| 7/7 [08:55<00:00, 76.06s/it]
|
||||||
|
100%|██████████| 7/7 [08:55<00:00, 76.44s/it]
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 1: generating actions ...
|
||||||
|
>>> Step 1: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 2: generating actions ...
|
||||||
|
>>> Step 2: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 3: generating actions ...
|
||||||
|
>>> Step 3: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
>>> Step 4: generating actions ...
|
>>> Step 4: generating actions ...
|
||||||
DEBUG:PIL.Image:Importing SunImagePlugin
|
>>> Step 4: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing TgaImagePlugin
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
DEBUG:PIL.Image:Importing TiffImagePlugin
|
>>> Step 5: generating actions ...
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
{
|
{
|
||||||
"gt_video": "unitree_z1_dual_arm_stackbox/case4/unitree_z1_dual_arm_stackbox_case4.mp4",
|
"gt_video": "unitree_z1_dual_arm_stackbox/case4/unitree_z1_dual_arm_stackbox_case4.mp4",
|
||||||
"pred_video": "unitree_z1_dual_arm_stackbox/case4/output/inference/35_full_fs4.mp4",
|
"pred_video": "unitree_z1_dual_arm_stackbox/case4/output/inference/35_full_fs4.mp4",
|
||||||
"psnr": 42.34177660061245
|
"psnr": 40.29563315341769
|
||||||
}
|
}
|
||||||
@@ -20,6 +20,5 @@ dataset="unitree_z1_dual_arm_stackbox"
|
|||||||
--n_iter 7 \
|
--n_iter 7 \
|
||||||
--timestep_spacing 'uniform_trailing' \
|
--timestep_spacing 'uniform_trailing' \
|
||||||
--guidance_rescale 0.7 \
|
--guidance_rescale 0.7 \
|
||||||
--perframe_ae \
|
--perframe_ae
|
||||||
--fast_policy_no_decode
|
|
||||||
} 2>&1 | tee "${res_dir}/output.log"
|
} 2>&1 | tee "${res_dir}/output.log"
|
||||||
|
|||||||
@@ -1,16 +1,21 @@
|
|||||||
2026-02-11 18:23:54.635983: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
2026-02-11 22:18:17.396072: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
||||||
2026-02-11 18:23:54.685542: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
2026-02-11 22:18:17.446095: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
||||||
2026-02-11 18:23:54.685587: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
2026-02-11 22:18:17.446154: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
||||||
2026-02-11 18:23:54.686907: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
2026-02-11 22:18:17.447480: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
||||||
2026-02-11 18:23:54.694405: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
2026-02-11 22:18:17.455025: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
||||||
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
||||||
2026-02-11 18:23:55.620959: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
2026-02-11 22:18:18.367007: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
||||||
Global seed set to 123
|
Global seed set to 123
|
||||||
>>> Loading prepared model from ckpts/unifolm_wma_dual.ckpt.prepared.pt ...
|
INFO:mainlogger:LatentVisualDiffusion: Running in v-prediction mode
|
||||||
>>> Prepared model loaded.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Diffusion backbone (model.model) converted to FP16.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Projectors (image_proj_model, state_projector, action_projector) converted to FP16.
|
AE working on z of shape (1, 4, 32, 32) = 4096 dimensions.
|
||||||
>>> Encoders (cond_stage_model, embedder) converted to FP16.
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
>>> model checkpoint loaded.
|
||||||
|
>>> Load pre-trained model ...
|
||||||
INFO:root:***** Configing Data *****
|
INFO:root:***** Configing Data *****
|
||||||
>>> unitree_z1_stackbox: 1 data samples loaded.
|
>>> unitree_z1_stackbox: 1 data samples loaded.
|
||||||
>>> unitree_z1_stackbox: data stats loaded.
|
>>> unitree_z1_stackbox: data stats loaded.
|
||||||
@@ -28,46 +33,13 @@ INFO:root:***** Configing Data *****
|
|||||||
>>> unitree_g1_pack_camera: data stats loaded.
|
>>> unitree_g1_pack_camera: data stats loaded.
|
||||||
>>> unitree_g1_pack_camera: normalizer initiated.
|
>>> unitree_g1_pack_camera: normalizer initiated.
|
||||||
>>> Dataset is successfully loaded ...
|
>>> Dataset is successfully loaded ...
|
||||||
✓ KV fused: 66 attention layers
|
|
||||||
>>> Generate 16 frames under each generation ...
|
>>> Generate 16 frames under each generation ...
|
||||||
DEBUG:h5py._conv:Creating converter from 3 to 5
|
DEBUG:h5py._conv:Creating converter from 3 to 5
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
||||||
|
|
||||||
0%| | 0/11 [00:00<?, ?it/s]
|
|
||||||
0%| | 0/11 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
0%| | 0/11 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
||||||
18%|█▊ | 2/11 [00:46<03:26, 22.96s/it]
|
|
||||||
27%|██▋ | 3/11 [01:08<03:03, 22.91s/it]
|
|
||||||
36%|███▋ | 4/11 [01:31<02:40, 22.86s/it]
|
|
||||||
45%|████▌ | 5/11 [01:54<02:16, 22.83s/it]
|
|
||||||
55%|█████▍ | 6/11 [02:17<01:54, 22.80s/it]
|
|
||||||
64%|██████▎ | 7/11 [02:39<01:31, 22.79s/it]
|
|
||||||
73%|███████▎ | 8/11 [03:02<01:08, 22.79s/it]
|
|
||||||
82%|████████▏ | 9/11 [03:25<00:45, 22.78s/it]
|
|
||||||
91%|█████████ | 10/11 [03:48<00:22, 22.76s/it]
|
|
||||||
100%|██████████| 11/11 [04:10<00:00, 22.75s/it]
|
|
||||||
100%|██████████| 11/11 [04:10<00:00, 22.82s/it]
|
|
||||||
>>> Step 0: generating actions ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 1: generating actions ...
|
|
||||||
>>> Step 1: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 2: generating actions ...
|
|
||||||
>>> Step 2: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 3: generating actions ...
|
|
||||||
>>> Step 3: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 4: generating actions ...
|
|
||||||
>>> Step 4: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 5: generating actions ...
|
|
||||||
>>> Step 5: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 6: generating actions ...
|
|
||||||
>>> Step 6: interacting with world model ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
>>> Step 0: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing BlpImagePlugin
|
DEBUG:PIL.Image:Importing BlpImagePlugin
|
||||||
DEBUG:PIL.Image:Importing BmpImagePlugin
|
DEBUG:PIL.Image:Importing BmpImagePlugin
|
||||||
@@ -117,7 +89,39 @@ DEBUG:PIL.Image:Importing WmfImagePlugin
|
|||||||
DEBUG:PIL.Image:Importing WmfImagePlugin
|
DEBUG:PIL.Image:Importing WmfImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XbmImagePlugin
|
DEBUG:PIL.Image:Importing XbmImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XpmImagePlugin
|
DEBUG:PIL.Image:Importing XpmImagePlugin
|
||||||
|
DEBUG:PIL.Image:Importing XVThumbImagePlugin
|
||||||
|
|
||||||
|
9%|▉ | 1/11 [01:15<12:35, 75.53s/it]
|
||||||
|
18%|█▊ | 2/11 [02:30<11:18, 75.39s/it]
|
||||||
|
27%|██▋ | 3/11 [03:46<10:03, 75.38s/it]
|
||||||
|
36%|███▋ | 4/11 [05:01<08:48, 75.47s/it]
|
||||||
|
45%|████▌ | 5/11 [06:16<07:31, 75.32s/it]
|
||||||
|
55%|█████▍ | 6/11 [07:31<06:15, 75.08s/it]
|
||||||
|
64%|██████▎ | 7/11 [08:46<05:00, 75.07s/it]
|
||||||
|
73%|███████▎ | 8/11 [10:00<03:44, 74.76s/it]
|
||||||
|
82%|████████▏ | 9/11 [11:15<02:29, 74.87s/it]
|
||||||
|
91%|█████████ | 10/11 [12:30<01:14, 74.79s/it]
|
||||||
|
100%|██████████| 11/11 [13:45<00:00, 74.80s/it]
|
||||||
|
100%|██████████| 11/11 [13:45<00:00, 75.02s/it]
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 1: generating actions ...
|
||||||
|
>>> Step 1: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 2: generating actions ...
|
||||||
|
>>> Step 2: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 3: generating actions ...
|
||||||
|
>>> Step 3: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 4: generating actions ...
|
||||||
|
>>> Step 4: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 5: generating actions ...
|
||||||
|
>>> Step 5: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 6: generating actions ...
|
||||||
|
>>> Step 6: interacting with world model ...
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
DEBUG:PIL.Image:Importing PsdImagePlugin
|
>>> Step 7: generating actions ...
|
||||||
DEBUG:PIL.Image:Importing QoiImagePlugin
|
>>> Step 7: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing SgiImagePlugin
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
{
|
{
|
||||||
"gt_video": "unitree_z1_dual_arm_stackbox_v2/case1/unitree_z1_dual_arm_stackbox_v2_case1.mp4",
|
"gt_video": "unitree_z1_dual_arm_stackbox_v2/case1/unitree_z1_dual_arm_stackbox_v2_case1.mp4",
|
||||||
"pred_video": "unitree_z1_dual_arm_stackbox_v2/case1/output/inference/5_full_fs4.mp4",
|
"pred_video": "unitree_z1_dual_arm_stackbox_v2/case1/output/inference/5_full_fs4.mp4",
|
||||||
"psnr": 26.683000215343522
|
"psnr": 25.812741419225095
|
||||||
}
|
}
|
||||||
@@ -20,6 +20,5 @@ dataset="unitree_z1_dual_arm_stackbox_v2"
|
|||||||
--n_iter 11 \
|
--n_iter 11 \
|
||||||
--timestep_spacing 'uniform_trailing' \
|
--timestep_spacing 'uniform_trailing' \
|
||||||
--guidance_rescale 0.7 \
|
--guidance_rescale 0.7 \
|
||||||
--perframe_ae \
|
--perframe_ae
|
||||||
--fast_policy_no_decode
|
|
||||||
} 2>&1 | tee "${res_dir}/output.log"
|
} 2>&1 | tee "${res_dir}/output.log"
|
||||||
|
|||||||
@@ -1,16 +1,21 @@
|
|||||||
2026-02-11 18:28:48.801743: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
2026-02-11 22:33:42.261398: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
||||||
2026-02-11 18:28:48.852069: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
2026-02-11 22:33:42.310786: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
||||||
2026-02-11 18:28:48.852128: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
2026-02-11 22:33:42.310845: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
||||||
2026-02-11 18:28:48.853466: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
2026-02-11 22:33:42.312191: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
||||||
2026-02-11 18:28:48.861133: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
2026-02-11 22:33:42.319738: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
||||||
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
||||||
2026-02-11 18:28:49.784354: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
2026-02-11 22:33:43.232517: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
||||||
Global seed set to 123
|
Global seed set to 123
|
||||||
>>> Loading prepared model from ckpts/unifolm_wma_dual.ckpt.prepared.pt ...
|
INFO:mainlogger:LatentVisualDiffusion: Running in v-prediction mode
|
||||||
>>> Prepared model loaded.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Diffusion backbone (model.model) converted to FP16.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Projectors (image_proj_model, state_projector, action_projector) converted to FP16.
|
AE working on z of shape (1, 4, 32, 32) = 4096 dimensions.
|
||||||
>>> Encoders (cond_stage_model, embedder) converted to FP16.
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
>>> model checkpoint loaded.
|
||||||
|
>>> Load pre-trained model ...
|
||||||
INFO:root:***** Configing Data *****
|
INFO:root:***** Configing Data *****
|
||||||
>>> unitree_z1_stackbox: 1 data samples loaded.
|
>>> unitree_z1_stackbox: 1 data samples loaded.
|
||||||
>>> unitree_z1_stackbox: data stats loaded.
|
>>> unitree_z1_stackbox: data stats loaded.
|
||||||
@@ -28,46 +33,13 @@ INFO:root:***** Configing Data *****
|
|||||||
>>> unitree_g1_pack_camera: data stats loaded.
|
>>> unitree_g1_pack_camera: data stats loaded.
|
||||||
>>> unitree_g1_pack_camera: normalizer initiated.
|
>>> unitree_g1_pack_camera: normalizer initiated.
|
||||||
>>> Dataset is successfully loaded ...
|
>>> Dataset is successfully loaded ...
|
||||||
✓ KV fused: 66 attention layers
|
|
||||||
>>> Generate 16 frames under each generation ...
|
>>> Generate 16 frames under each generation ...
|
||||||
DEBUG:h5py._conv:Creating converter from 3 to 5
|
DEBUG:h5py._conv:Creating converter from 3 to 5
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
||||||
|
|
||||||
0%| | 0/11 [00:00<?, ?it/s]
|
|
||||||
0%| | 0/11 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
0%| | 0/11 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
||||||
18%|█▊ | 2/11 [00:46<03:28, 23.13s/it]
|
|
||||||
27%|██▋ | 3/11 [01:09<03:04, 23.02s/it]
|
|
||||||
36%|███▋ | 4/11 [01:32<02:40, 22.96s/it]
|
|
||||||
45%|████▌ | 5/11 [01:55<02:17, 22.92s/it]
|
|
||||||
55%|█████▍ | 6/11 [02:17<01:54, 22.88s/it]
|
|
||||||
64%|██████▎ | 7/11 [02:40<01:31, 22.84s/it]
|
|
||||||
73%|███████▎ | 8/11 [03:03<01:08, 22.81s/it]
|
|
||||||
82%|████████▏ | 9/11 [03:26<00:45, 22.81s/it]
|
|
||||||
91%|█████████ | 10/11 [03:48<00:22, 22.80s/it]
|
|
||||||
100%|██████████| 11/11 [04:11<00:00, 22.80s/it]
|
|
||||||
100%|██████████| 11/11 [04:11<00:00, 22.88s/it]
|
|
||||||
>>> Step 0: generating actions ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 1: generating actions ...
|
|
||||||
>>> Step 1: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 2: generating actions ...
|
|
||||||
>>> Step 2: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 3: generating actions ...
|
|
||||||
>>> Step 3: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 4: generating actions ...
|
|
||||||
>>> Step 4: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 5: generating actions ...
|
|
||||||
>>> Step 5: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 6: generating actions ...
|
|
||||||
>>> Step 6: interacting with world model ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
>>> Step 0: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing BlpImagePlugin
|
DEBUG:PIL.Image:Importing BlpImagePlugin
|
||||||
DEBUG:PIL.Image:Importing BmpImagePlugin
|
DEBUG:PIL.Image:Importing BmpImagePlugin
|
||||||
@@ -117,7 +89,39 @@ DEBUG:PIL.Image:Importing WmfImagePlugin
|
|||||||
DEBUG:PIL.Image:Importing WmfImagePlugin
|
DEBUG:PIL.Image:Importing WmfImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XbmImagePlugin
|
DEBUG:PIL.Image:Importing XbmImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XpmImagePlugin
|
DEBUG:PIL.Image:Importing XpmImagePlugin
|
||||||
|
DEBUG:PIL.Image:Importing XVThumbImagePlugin
|
||||||
|
|
||||||
|
9%|▉ | 1/11 [01:13<12:18, 73.90s/it]
|
||||||
|
18%|█▊ | 2/11 [02:27<11:05, 73.99s/it]
|
||||||
|
27%|██▋ | 3/11 [03:41<09:50, 73.86s/it]
|
||||||
|
36%|███▋ | 4/11 [04:55<08:35, 73.70s/it]
|
||||||
|
45%|████▌ | 5/11 [06:08<07:20, 73.48s/it]
|
||||||
|
55%|█████▍ | 6/11 [07:21<06:06, 73.39s/it]
|
||||||
|
64%|██████▎ | 7/11 [08:34<04:53, 73.28s/it]
|
||||||
|
73%|███████▎ | 8/11 [09:47<03:39, 73.11s/it]
|
||||||
|
82%|████████▏ | 9/11 [11:00<02:26, 73.21s/it]
|
||||||
|
91%|█████████ | 10/11 [12:14<01:13, 73.49s/it]
|
||||||
|
100%|██████████| 11/11 [13:28<00:00, 73.55s/it]
|
||||||
|
100%|██████████| 11/11 [13:28<00:00, 73.50s/it]
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 1: generating actions ...
|
||||||
|
>>> Step 1: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 2: generating actions ...
|
||||||
|
>>> Step 2: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 3: generating actions ...
|
||||||
|
>>> Step 3: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 4: generating actions ...
|
||||||
|
>>> Step 4: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 5: generating actions ...
|
||||||
|
>>> Step 5: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 6: generating actions ...
|
||||||
|
>>> Step 6: interacting with world model ...
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
DEBUG:PIL.Image:Importing PsdImagePlugin
|
>>> Step 7: generating actions ...
|
||||||
DEBUG:PIL.Image:Importing QoiImagePlugin
|
>>> Step 7: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing SgiImagePlugin
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
{
|
{
|
||||||
"gt_video": "unitree_z1_dual_arm_stackbox_v2/case2/unitree_z1_dual_arm_stackbox_v2_case2.mp4",
|
"gt_video": "unitree_z1_dual_arm_stackbox_v2/case2/unitree_z1_dual_arm_stackbox_v2_case2.mp4",
|
||||||
"pred_video": "unitree_z1_dual_arm_stackbox_v2/case2/output/inference/15_full_fs4.mp4",
|
"pred_video": "unitree_z1_dual_arm_stackbox_v2/case2/output/inference/15_full_fs4.mp4",
|
||||||
"psnr": 27.46347145461597
|
"psnr": 33.90444714332389
|
||||||
}
|
}
|
||||||
@@ -20,6 +20,5 @@ dataset="unitree_z1_dual_arm_stackbox_v2"
|
|||||||
--n_iter 11 \
|
--n_iter 11 \
|
||||||
--timestep_spacing 'uniform_trailing' \
|
--timestep_spacing 'uniform_trailing' \
|
||||||
--guidance_rescale 0.7 \
|
--guidance_rescale 0.7 \
|
||||||
--perframe_ae \
|
--perframe_ae
|
||||||
--fast_policy_no_decode
|
|
||||||
} 2>&1 | tee "${res_dir}/output.log"
|
} 2>&1 | tee "${res_dir}/output.log"
|
||||||
|
|||||||
@@ -1,16 +1,21 @@
|
|||||||
2026-02-11 18:33:43.119091: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
2026-02-11 22:48:49.761688: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
||||||
2026-02-11 18:33:43.169099: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
2026-02-11 22:48:49.811395: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
||||||
2026-02-11 18:33:43.169143: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
2026-02-11 22:48:49.811456: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
||||||
2026-02-11 18:33:43.170444: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
2026-02-11 22:48:49.812798: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
||||||
2026-02-11 18:33:43.177944: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
2026-02-11 22:48:49.820307: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
||||||
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
||||||
2026-02-11 18:33:44.102499: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
2026-02-11 22:48:50.732941: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
||||||
Global seed set to 123
|
Global seed set to 123
|
||||||
>>> Loading prepared model from ckpts/unifolm_wma_dual.ckpt.prepared.pt ...
|
INFO:mainlogger:LatentVisualDiffusion: Running in v-prediction mode
|
||||||
>>> Prepared model loaded.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Diffusion backbone (model.model) converted to FP16.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Projectors (image_proj_model, state_projector, action_projector) converted to FP16.
|
AE working on z of shape (1, 4, 32, 32) = 4096 dimensions.
|
||||||
>>> Encoders (cond_stage_model, embedder) converted to FP16.
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
>>> model checkpoint loaded.
|
||||||
|
>>> Load pre-trained model ...
|
||||||
INFO:root:***** Configing Data *****
|
INFO:root:***** Configing Data *****
|
||||||
>>> unitree_z1_stackbox: 1 data samples loaded.
|
>>> unitree_z1_stackbox: 1 data samples loaded.
|
||||||
>>> unitree_z1_stackbox: data stats loaded.
|
>>> unitree_z1_stackbox: data stats loaded.
|
||||||
@@ -28,46 +33,13 @@ INFO:root:***** Configing Data *****
|
|||||||
>>> unitree_g1_pack_camera: data stats loaded.
|
>>> unitree_g1_pack_camera: data stats loaded.
|
||||||
>>> unitree_g1_pack_camera: normalizer initiated.
|
>>> unitree_g1_pack_camera: normalizer initiated.
|
||||||
>>> Dataset is successfully loaded ...
|
>>> Dataset is successfully loaded ...
|
||||||
✓ KV fused: 66 attention layers
|
|
||||||
>>> Generate 16 frames under each generation ...
|
>>> Generate 16 frames under each generation ...
|
||||||
DEBUG:h5py._conv:Creating converter from 3 to 5
|
DEBUG:h5py._conv:Creating converter from 3 to 5
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
||||||
|
|
||||||
0%| | 0/11 [00:00<?, ?it/s]
|
|
||||||
0%| | 0/11 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
0%| | 0/11 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
||||||
18%|█▊ | 2/11 [00:46<03:26, 22.99s/it]
|
|
||||||
27%|██▋ | 3/11 [01:08<03:03, 22.93s/it]
|
|
||||||
36%|███▋ | 4/11 [01:31<02:40, 22.87s/it]
|
|
||||||
45%|████▌ | 5/11 [01:54<02:17, 22.85s/it]
|
|
||||||
55%|█████▍ | 6/11 [02:17<01:54, 22.80s/it]
|
|
||||||
64%|██████▎ | 7/11 [02:40<01:31, 22.79s/it]
|
|
||||||
73%|███████▎ | 8/11 [03:02<01:08, 22.78s/it]
|
|
||||||
82%|████████▏ | 9/11 [03:25<00:45, 22.76s/it]
|
|
||||||
91%|█████████ | 10/11 [03:48<00:22, 22.74s/it]
|
|
||||||
100%|██████████| 11/11 [04:10<00:00, 22.73s/it]
|
|
||||||
100%|██████████| 11/11 [04:10<00:00, 22.81s/it]
|
|
||||||
>>> Step 0: generating actions ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 1: generating actions ...
|
|
||||||
>>> Step 1: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 2: generating actions ...
|
|
||||||
>>> Step 2: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 3: generating actions ...
|
|
||||||
>>> Step 3: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 4: generating actions ...
|
|
||||||
>>> Step 4: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 5: generating actions ...
|
|
||||||
>>> Step 5: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 6: generating actions ...
|
|
||||||
>>> Step 6: interacting with world model ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
>>> Step 0: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing BlpImagePlugin
|
DEBUG:PIL.Image:Importing BlpImagePlugin
|
||||||
DEBUG:PIL.Image:Importing BmpImagePlugin
|
DEBUG:PIL.Image:Importing BmpImagePlugin
|
||||||
@@ -117,7 +89,39 @@ DEBUG:PIL.Image:Importing WmfImagePlugin
|
|||||||
DEBUG:PIL.Image:Importing WmfImagePlugin
|
DEBUG:PIL.Image:Importing WmfImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XbmImagePlugin
|
DEBUG:PIL.Image:Importing XbmImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XpmImagePlugin
|
DEBUG:PIL.Image:Importing XpmImagePlugin
|
||||||
|
DEBUG:PIL.Image:Importing XVThumbImagePlugin
|
||||||
|
|
||||||
|
9%|▉ | 1/11 [01:14<12:29, 74.99s/it]
|
||||||
|
18%|█▊ | 2/11 [02:30<11:18, 75.44s/it]
|
||||||
|
27%|██▋ | 3/11 [03:46<10:03, 75.49s/it]
|
||||||
|
36%|███▋ | 4/11 [05:01<08:47, 75.30s/it]
|
||||||
|
45%|████▌ | 5/11 [06:15<07:30, 75.02s/it]
|
||||||
|
55%|█████▍ | 6/11 [07:30<06:14, 74.84s/it]
|
||||||
|
64%|██████▎ | 7/11 [08:44<04:59, 74.79s/it]
|
||||||
|
73%|███████▎ | 8/11 [10:00<03:44, 74.94s/it]
|
||||||
|
82%|████████▏ | 9/11 [11:15<02:29, 74.89s/it]
|
||||||
|
91%|█████████ | 10/11 [12:30<01:14, 74.95s/it]
|
||||||
|
100%|██████████| 11/11 [13:45<00:00, 74.95s/it]
|
||||||
|
100%|██████████| 11/11 [13:45<00:00, 75.01s/it]
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 1: generating actions ...
|
||||||
|
>>> Step 1: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 2: generating actions ...
|
||||||
|
>>> Step 2: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 3: generating actions ...
|
||||||
|
>>> Step 3: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 4: generating actions ...
|
||||||
|
>>> Step 4: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 5: generating actions ...
|
||||||
|
>>> Step 5: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 6: generating actions ...
|
||||||
|
>>> Step 6: interacting with world model ...
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
DEBUG:PIL.Image:Importing PsdImagePlugin
|
>>> Step 7: generating actions ...
|
||||||
DEBUG:PIL.Image:Importing QoiImagePlugin
|
>>> Step 7: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing SgiImagePlugin
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
{
|
{
|
||||||
"gt_video": "unitree_z1_dual_arm_stackbox_v2/case3/unitree_z1_dual_arm_stackbox_v2_case3.mp4",
|
"gt_video": "unitree_z1_dual_arm_stackbox_v2/case3/unitree_z1_dual_arm_stackbox_v2_case3.mp4",
|
||||||
"pred_video": "unitree_z1_dual_arm_stackbox_v2/case3/output/inference/25_full_fs4.mp4",
|
"pred_video": "unitree_z1_dual_arm_stackbox_v2/case3/output/inference/25_full_fs4.mp4",
|
||||||
"psnr": 28.604047286947512
|
"psnr": 34.50192428908007
|
||||||
}
|
}
|
||||||
@@ -20,6 +20,5 @@ dataset="unitree_z1_dual_arm_stackbox_v2"
|
|||||||
--n_iter 11 \
|
--n_iter 11 \
|
||||||
--timestep_spacing 'uniform_trailing' \
|
--timestep_spacing 'uniform_trailing' \
|
||||||
--guidance_rescale 0.7 \
|
--guidance_rescale 0.7 \
|
||||||
--perframe_ae \
|
--perframe_ae
|
||||||
--fast_policy_no_decode
|
|
||||||
} 2>&1 | tee "${res_dir}/output.log"
|
} 2>&1 | tee "${res_dir}/output.log"
|
||||||
|
|||||||
@@ -1,16 +1,21 @@
|
|||||||
2026-02-11 18:38:37.252690: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
2026-02-11 23:04:15.762959: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
||||||
2026-02-11 18:38:37.301897: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
2026-02-11 23:04:15.814243: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
||||||
2026-02-11 18:38:37.301950: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
2026-02-11 23:04:15.814301: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
||||||
2026-02-11 18:38:37.303254: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
2026-02-11 23:04:15.815653: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
||||||
2026-02-11 18:38:37.310679: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
2026-02-11 23:04:15.823287: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
||||||
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
||||||
2026-02-11 18:38:38.237893: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
2026-02-11 23:04:16.742609: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
||||||
Global seed set to 123
|
Global seed set to 123
|
||||||
>>> Loading prepared model from ckpts/unifolm_wma_dual.ckpt.prepared.pt ...
|
INFO:mainlogger:LatentVisualDiffusion: Running in v-prediction mode
|
||||||
>>> Prepared model loaded.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Diffusion backbone (model.model) converted to FP16.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Projectors (image_proj_model, state_projector, action_projector) converted to FP16.
|
AE working on z of shape (1, 4, 32, 32) = 4096 dimensions.
|
||||||
>>> Encoders (cond_stage_model, embedder) converted to FP16.
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
>>> model checkpoint loaded.
|
||||||
|
>>> Load pre-trained model ...
|
||||||
INFO:root:***** Configing Data *****
|
INFO:root:***** Configing Data *****
|
||||||
>>> unitree_z1_stackbox: 1 data samples loaded.
|
>>> unitree_z1_stackbox: 1 data samples loaded.
|
||||||
>>> unitree_z1_stackbox: data stats loaded.
|
>>> unitree_z1_stackbox: data stats loaded.
|
||||||
@@ -28,46 +33,13 @@ INFO:root:***** Configing Data *****
|
|||||||
>>> unitree_g1_pack_camera: data stats loaded.
|
>>> unitree_g1_pack_camera: data stats loaded.
|
||||||
>>> unitree_g1_pack_camera: normalizer initiated.
|
>>> unitree_g1_pack_camera: normalizer initiated.
|
||||||
>>> Dataset is successfully loaded ...
|
>>> Dataset is successfully loaded ...
|
||||||
✓ KV fused: 66 attention layers
|
|
||||||
>>> Generate 16 frames under each generation ...
|
>>> Generate 16 frames under each generation ...
|
||||||
DEBUG:h5py._conv:Creating converter from 3 to 5
|
DEBUG:h5py._conv:Creating converter from 3 to 5
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
||||||
|
|
||||||
0%| | 0/11 [00:00<?, ?it/s]
|
|
||||||
0%| | 0/11 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
0%| | 0/11 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
||||||
18%|█▊ | 2/11 [00:46<03:26, 22.99s/it]
|
|
||||||
27%|██▋ | 3/11 [01:08<03:03, 22.91s/it]
|
|
||||||
36%|███▋ | 4/11 [01:31<02:40, 22.86s/it]
|
|
||||||
45%|████▌ | 5/11 [01:54<02:16, 22.83s/it]
|
|
||||||
55%|█████▍ | 6/11 [02:17<01:54, 22.82s/it]
|
|
||||||
64%|██████▎ | 7/11 [02:40<01:31, 22.81s/it]
|
|
||||||
73%|███████▎ | 8/11 [03:02<01:08, 22.80s/it]
|
|
||||||
82%|████████▏ | 9/11 [03:25<00:45, 22.78s/it]
|
|
||||||
91%|█████████ | 10/11 [03:48<00:22, 22.77s/it]
|
|
||||||
100%|██████████| 11/11 [04:11<00:00, 22.77s/it]
|
|
||||||
100%|██████████| 11/11 [04:11<00:00, 22.83s/it]
|
|
||||||
>>> Step 0: generating actions ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 1: generating actions ...
|
|
||||||
>>> Step 1: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 2: generating actions ...
|
|
||||||
>>> Step 2: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 3: generating actions ...
|
|
||||||
>>> Step 3: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 4: generating actions ...
|
|
||||||
>>> Step 4: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 5: generating actions ...
|
|
||||||
>>> Step 5: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 6: generating actions ...
|
|
||||||
>>> Step 6: interacting with world model ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
>>> Step 0: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing BlpImagePlugin
|
DEBUG:PIL.Image:Importing BlpImagePlugin
|
||||||
DEBUG:PIL.Image:Importing BmpImagePlugin
|
DEBUG:PIL.Image:Importing BmpImagePlugin
|
||||||
@@ -117,7 +89,39 @@ DEBUG:PIL.Image:Importing WmfImagePlugin
|
|||||||
DEBUG:PIL.Image:Importing WmfImagePlugin
|
DEBUG:PIL.Image:Importing WmfImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XbmImagePlugin
|
DEBUG:PIL.Image:Importing XbmImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XpmImagePlugin
|
DEBUG:PIL.Image:Importing XpmImagePlugin
|
||||||
|
DEBUG:PIL.Image:Importing XVThumbImagePlugin
|
||||||
|
|
||||||
|
9%|▉ | 1/11 [01:14<12:25, 74.55s/it]
|
||||||
|
18%|█▊ | 2/11 [02:29<11:13, 74.83s/it]
|
||||||
|
27%|██▋ | 3/11 [03:44<09:58, 74.77s/it]
|
||||||
|
36%|███▋ | 4/11 [04:59<08:43, 74.78s/it]
|
||||||
|
45%|████▌ | 5/11 [06:13<07:28, 74.83s/it]
|
||||||
|
55%|█████▍ | 6/11 [07:28<06:13, 74.62s/it]
|
||||||
|
64%|██████▎ | 7/11 [08:42<04:57, 74.37s/it]
|
||||||
|
73%|███████▎ | 8/11 [09:56<03:42, 74.28s/it]
|
||||||
|
82%|████████▏ | 9/11 [11:10<02:28, 74.26s/it]
|
||||||
|
91%|█████████ | 10/11 [12:24<01:14, 74.31s/it]
|
||||||
|
100%|██████████| 11/11 [13:39<00:00, 74.38s/it]
|
||||||
|
100%|██████████| 11/11 [13:39<00:00, 74.48s/it]
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 1: generating actions ...
|
||||||
|
>>> Step 1: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 2: generating actions ...
|
||||||
|
>>> Step 2: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 3: generating actions ...
|
||||||
|
>>> Step 3: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 4: generating actions ...
|
||||||
|
>>> Step 4: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 5: generating actions ...
|
||||||
|
>>> Step 5: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 6: generating actions ...
|
||||||
|
>>> Step 6: interacting with world model ...
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
DEBUG:PIL.Image:Importing PsdImagePlugin
|
>>> Step 7: generating actions ...
|
||||||
DEBUG:PIL.Image:Importing QoiImagePlugin
|
>>> Step 7: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing SgiImagePlugin
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
{
|
{
|
||||||
"gt_video": "unitree_z1_dual_arm_stackbox_v2/case4/unitree_z1_dual_arm_stackbox_v2_case4.mp4",
|
"gt_video": "unitree_z1_dual_arm_stackbox_v2/case4/unitree_z1_dual_arm_stackbox_v2_case4.mp4",
|
||||||
"pred_video": "unitree_z1_dual_arm_stackbox_v2/case4/output/inference/35_full_fs4.mp4",
|
"pred_video": "unitree_z1_dual_arm_stackbox_v2/case4/output/inference/35_full_fs4.mp4",
|
||||||
"psnr": 25.578498826379903
|
"psnr": 38.797893493652516
|
||||||
}
|
}
|
||||||
@@ -20,6 +20,5 @@ dataset="unitree_z1_dual_arm_stackbox_v2"
|
|||||||
--n_iter 11 \
|
--n_iter 11 \
|
||||||
--timestep_spacing 'uniform_trailing' \
|
--timestep_spacing 'uniform_trailing' \
|
||||||
--guidance_rescale 0.7 \
|
--guidance_rescale 0.7 \
|
||||||
--perframe_ae \
|
--perframe_ae
|
||||||
--fast_policy_no_decode
|
|
||||||
} 2>&1 | tee "${res_dir}/output.log"
|
} 2>&1 | tee "${res_dir}/output.log"
|
||||||
|
|||||||
@@ -1,16 +1,21 @@
|
|||||||
2026-02-11 18:43:31.592464: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
2026-02-11 23:19:36.475817: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
||||||
2026-02-11 18:43:31.641865: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
2026-02-11 23:19:36.525118: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
||||||
2026-02-11 18:43:31.641908: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
2026-02-11 23:19:36.525172: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
||||||
2026-02-11 18:43:31.643209: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
2026-02-11 23:19:36.526479: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
||||||
2026-02-11 18:43:31.650663: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
2026-02-11 23:19:36.533981: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
||||||
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
||||||
2026-02-11 18:43:32.564662: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
2026-02-11 23:19:37.461985: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
||||||
Global seed set to 123
|
Global seed set to 123
|
||||||
>>> Loading prepared model from ckpts/unifolm_wma_dual.ckpt.prepared.pt ...
|
INFO:mainlogger:LatentVisualDiffusion: Running in v-prediction mode
|
||||||
>>> Prepared model loaded.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Diffusion backbone (model.model) converted to FP16.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Projectors (image_proj_model, state_projector, action_projector) converted to FP16.
|
AE working on z of shape (1, 4, 32, 32) = 4096 dimensions.
|
||||||
>>> Encoders (cond_stage_model, embedder) converted to FP16.
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
>>> model checkpoint loaded.
|
||||||
|
>>> Load pre-trained model ...
|
||||||
INFO:root:***** Configing Data *****
|
INFO:root:***** Configing Data *****
|
||||||
>>> unitree_z1_stackbox: 1 data samples loaded.
|
>>> unitree_z1_stackbox: 1 data samples loaded.
|
||||||
>>> unitree_z1_stackbox: data stats loaded.
|
>>> unitree_z1_stackbox: data stats loaded.
|
||||||
@@ -28,49 +33,13 @@ INFO:root:***** Configing Data *****
|
|||||||
>>> unitree_g1_pack_camera: data stats loaded.
|
>>> unitree_g1_pack_camera: data stats loaded.
|
||||||
>>> unitree_g1_pack_camera: normalizer initiated.
|
>>> unitree_g1_pack_camera: normalizer initiated.
|
||||||
>>> Dataset is successfully loaded ...
|
>>> Dataset is successfully loaded ...
|
||||||
✓ KV fused: 66 attention layers
|
|
||||||
>>> Generate 16 frames under each generation ...
|
>>> Generate 16 frames under each generation ...
|
||||||
DEBUG:h5py._conv:Creating converter from 3 to 5
|
DEBUG:h5py._conv:Creating converter from 3 to 5
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
||||||
|
|
||||||
0%| | 0/12 [00:00<?, ?it/s]
|
|
||||||
0%| | 0/12 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
0%| | 0/12 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
||||||
17%|█▋ | 2/12 [00:46<03:50, 23.03s/it]
|
|
||||||
25%|██▌ | 3/12 [01:09<03:26, 22.97s/it]
|
|
||||||
33%|███▎ | 4/12 [01:31<03:03, 22.92s/it]
|
|
||||||
42%|████▏ | 5/12 [01:54<02:40, 22.88s/it]
|
|
||||||
50%|█████ | 6/12 [02:17<02:17, 22.84s/it]
|
|
||||||
58%|█████▊ | 7/12 [02:40<01:54, 22.80s/it]
|
|
||||||
67%|██████▋ | 8/12 [03:02<01:31, 22.78s/it]
|
|
||||||
75%|███████▌ | 9/12 [03:25<01:08, 22.78s/it]
|
|
||||||
83%|████████▎ | 10/12 [03:48<00:45, 22.78s/it]
|
|
||||||
92%|█████████▏| 11/12 [04:11<00:22, 22.77s/it]
|
|
||||||
100%|██████████| 12/12 [04:34<00:00, 22.77s/it]
|
|
||||||
100%|██████████| 12/12 [04:34<00:00, 22.83s/it]
|
|
||||||
>>> Step 0: generating actions ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 1: generating actions ...
|
|
||||||
>>> Step 1: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 2: generating actions ...
|
|
||||||
>>> Step 2: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 3: generating actions ...
|
|
||||||
>>> Step 3: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 4: generating actions ...
|
|
||||||
>>> Step 4: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 5: generating actions ...
|
|
||||||
>>> Step 5: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 6: generating actions ...
|
|
||||||
>>> Step 6: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 7: generating actions ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
>>> Step 0: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing BlpImagePlugin
|
DEBUG:PIL.Image:Importing BlpImagePlugin
|
||||||
DEBUG:PIL.Image:Importing BmpImagePlugin
|
DEBUG:PIL.Image:Importing BmpImagePlugin
|
||||||
@@ -120,7 +89,42 @@ DEBUG:PIL.Image:Importing WmfImagePlugin
|
|||||||
DEBUG:PIL.Image:Importing WmfImagePlugin
|
DEBUG:PIL.Image:Importing WmfImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XbmImagePlugin
|
DEBUG:PIL.Image:Importing XbmImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XpmImagePlugin
|
DEBUG:PIL.Image:Importing XpmImagePlugin
|
||||||
|
DEBUG:PIL.Image:Importing XVThumbImagePlugin
|
||||||
|
|
||||||
|
8%|▊ | 1/12 [01:14<13:38, 74.37s/it]
|
||||||
|
17%|█▋ | 2/12 [02:28<12:21, 74.13s/it]
|
||||||
|
25%|██▌ | 3/12 [03:42<11:05, 73.97s/it]
|
||||||
|
33%|███▎ | 4/12 [04:55<09:51, 73.92s/it]
|
||||||
|
42%|████▏ | 5/12 [06:09<08:35, 73.70s/it]
|
||||||
|
50%|█████ | 6/12 [07:22<07:21, 73.58s/it]
|
||||||
|
58%|█████▊ | 7/12 [08:35<06:07, 73.49s/it]
|
||||||
|
67%|██████▋ | 8/12 [09:49<04:53, 73.46s/it]
|
||||||
|
75%|███████▌ | 9/12 [11:02<03:40, 73.42s/it]
|
||||||
|
83%|████████▎ | 10/12 [12:16<02:26, 73.44s/it]
|
||||||
|
92%|█████████▏| 11/12 [13:29<01:13, 73.40s/it]
|
||||||
|
100%|██████████| 12/12 [14:42<00:00, 73.32s/it]
|
||||||
|
100%|██████████| 12/12 [14:42<00:00, 73.55s/it]
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 1: generating actions ...
|
||||||
|
>>> Step 1: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 2: generating actions ...
|
||||||
|
>>> Step 2: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 3: generating actions ...
|
||||||
|
>>> Step 3: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 4: generating actions ...
|
||||||
|
>>> Step 4: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 5: generating actions ...
|
||||||
|
>>> Step 5: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 6: generating actions ...
|
||||||
|
>>> Step 6: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 7: generating actions ...
|
||||||
>>> Step 7: interacting with world model ...
|
>>> Step 7: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing PpmImagePlugin
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
DEBUG:PIL.Image:Importing PsdImagePlugin
|
>>> Step 8: generating actions ...
|
||||||
DEBUG:PIL.Image:Importing QoiImagePlugin
|
>>> Step 8: interacting with world model ...
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
{
|
{
|
||||||
"gt_video": "unitree_z1_stackbox/case1/unitree_z1_stackbox_case1.mp4",
|
"gt_video": "unitree_z1_stackbox/case1/unitree_z1_stackbox_case1.mp4",
|
||||||
"pred_video": "unitree_z1_stackbox/case1/output/inference/5_full_fs4.mp4",
|
"pred_video": "unitree_z1_stackbox/case1/output/inference/5_full_fs4.mp4",
|
||||||
"psnr": 46.05271283048069
|
"psnr": 42.83913947323794
|
||||||
}
|
}
|
||||||
@@ -20,6 +20,5 @@ dataset="unitree_z1_stackbox"
|
|||||||
--n_iter 12 \
|
--n_iter 12 \
|
||||||
--timestep_spacing 'uniform_trailing' \
|
--timestep_spacing 'uniform_trailing' \
|
||||||
--guidance_rescale 0.7 \
|
--guidance_rescale 0.7 \
|
||||||
--perframe_ae \
|
--perframe_ae
|
||||||
--fast_policy_no_decode
|
|
||||||
} 2>&1 | tee "${res_dir}/output.log"
|
} 2>&1 | tee "${res_dir}/output.log"
|
||||||
|
|||||||
@@ -1,16 +1,21 @@
|
|||||||
2026-02-11 18:48:44.235405: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
2026-02-11 23:35:52.961572: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
||||||
2026-02-11 18:48:44.285138: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
2026-02-11 23:35:53.011562: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
||||||
2026-02-11 18:48:44.285181: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
2026-02-11 23:35:53.011622: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
||||||
2026-02-11 18:48:44.286531: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
2026-02-11 23:35:53.012961: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
||||||
2026-02-11 18:48:44.294141: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
2026-02-11 23:35:53.020520: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
||||||
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
||||||
2026-02-11 18:48:45.209453: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
2026-02-11 23:35:53.953177: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
||||||
Global seed set to 123
|
Global seed set to 123
|
||||||
>>> Loading prepared model from ckpts/unifolm_wma_dual.ckpt.prepared.pt ...
|
INFO:mainlogger:LatentVisualDiffusion: Running in v-prediction mode
|
||||||
>>> Prepared model loaded.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Diffusion backbone (model.model) converted to FP16.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Projectors (image_proj_model, state_projector, action_projector) converted to FP16.
|
AE working on z of shape (1, 4, 32, 32) = 4096 dimensions.
|
||||||
>>> Encoders (cond_stage_model, embedder) converted to FP16.
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
>>> model checkpoint loaded.
|
||||||
|
>>> Load pre-trained model ...
|
||||||
INFO:root:***** Configing Data *****
|
INFO:root:***** Configing Data *****
|
||||||
>>> unitree_z1_stackbox: 1 data samples loaded.
|
>>> unitree_z1_stackbox: 1 data samples loaded.
|
||||||
>>> unitree_z1_stackbox: data stats loaded.
|
>>> unitree_z1_stackbox: data stats loaded.
|
||||||
@@ -28,49 +33,13 @@ INFO:root:***** Configing Data *****
|
|||||||
>>> unitree_g1_pack_camera: data stats loaded.
|
>>> unitree_g1_pack_camera: data stats loaded.
|
||||||
>>> unitree_g1_pack_camera: normalizer initiated.
|
>>> unitree_g1_pack_camera: normalizer initiated.
|
||||||
>>> Dataset is successfully loaded ...
|
>>> Dataset is successfully loaded ...
|
||||||
✓ KV fused: 66 attention layers
|
|
||||||
>>> Generate 16 frames under each generation ...
|
>>> Generate 16 frames under each generation ...
|
||||||
DEBUG:h5py._conv:Creating converter from 3 to 5
|
DEBUG:h5py._conv:Creating converter from 3 to 5
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
||||||
|
|
||||||
0%| | 0/12 [00:00<?, ?it/s]
|
|
||||||
0%| | 0/12 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
0%| | 0/12 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
||||||
17%|█▋ | 2/12 [00:46<03:49, 22.97s/it]
|
|
||||||
25%|██▌ | 3/12 [01:08<03:26, 22.91s/it]
|
|
||||||
33%|███▎ | 4/12 [01:31<03:02, 22.86s/it]
|
|
||||||
42%|████▏ | 5/12 [01:54<02:39, 22.82s/it]
|
|
||||||
50%|█████ | 6/12 [02:17<02:16, 22.81s/it]
|
|
||||||
58%|█████▊ | 7/12 [02:39<01:53, 22.79s/it]
|
|
||||||
67%|██████▋ | 8/12 [03:02<01:31, 22.78s/it]
|
|
||||||
75%|███████▌ | 9/12 [03:25<01:08, 22.76s/it]
|
|
||||||
83%|████████▎ | 10/12 [03:48<00:45, 22.75s/it]
|
|
||||||
92%|█████████▏| 11/12 [04:10<00:22, 22.74s/it]
|
|
||||||
100%|██████████| 12/12 [04:33<00:00, 22.72s/it]
|
|
||||||
100%|██████████| 12/12 [04:33<00:00, 22.80s/it]
|
|
||||||
>>> Step 0: generating actions ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 1: generating actions ...
|
|
||||||
>>> Step 1: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 2: generating actions ...
|
|
||||||
>>> Step 2: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 3: generating actions ...
|
|
||||||
>>> Step 3: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 4: generating actions ...
|
|
||||||
>>> Step 4: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 5: generating actions ...
|
|
||||||
>>> Step 5: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 6: generating actions ...
|
|
||||||
>>> Step 6: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 7: generating actions ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
>>> Step 0: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing BlpImagePlugin
|
DEBUG:PIL.Image:Importing BlpImagePlugin
|
||||||
DEBUG:PIL.Image:Importing BmpImagePlugin
|
DEBUG:PIL.Image:Importing BmpImagePlugin
|
||||||
@@ -120,7 +89,42 @@ DEBUG:PIL.Image:Importing WmfImagePlugin
|
|||||||
DEBUG:PIL.Image:Importing WmfImagePlugin
|
DEBUG:PIL.Image:Importing WmfImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XbmImagePlugin
|
DEBUG:PIL.Image:Importing XbmImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XpmImagePlugin
|
DEBUG:PIL.Image:Importing XpmImagePlugin
|
||||||
|
DEBUG:PIL.Image:Importing XVThumbImagePlugin
|
||||||
|
|
||||||
|
8%|▊ | 1/12 [01:13<13:33, 73.92s/it]
|
||||||
|
17%|█▋ | 2/12 [02:28<12:22, 74.25s/it]
|
||||||
|
25%|██▌ | 3/12 [03:41<11:01, 73.53s/it]
|
||||||
|
33%|███▎ | 4/12 [04:53<09:45, 73.17s/it]
|
||||||
|
42%|████▏ | 5/12 [06:06<08:30, 72.96s/it]
|
||||||
|
50%|█████ | 6/12 [07:18<07:16, 72.80s/it]
|
||||||
|
58%|█████▊ | 7/12 [08:31<06:03, 72.71s/it]
|
||||||
|
67%|██████▋ | 8/12 [09:43<04:50, 72.64s/it]
|
||||||
|
75%|███████▌ | 9/12 [10:56<03:37, 72.61s/it]
|
||||||
|
83%|████████▎ | 10/12 [12:08<02:25, 72.56s/it]
|
||||||
|
92%|█████████▏| 11/12 [13:21<01:12, 72.53s/it]
|
||||||
|
100%|██████████| 12/12 [14:33<00:00, 72.48s/it]
|
||||||
|
100%|██████████| 12/12 [14:33<00:00, 72.80s/it]
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 1: generating actions ...
|
||||||
|
>>> Step 1: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 2: generating actions ...
|
||||||
|
>>> Step 2: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 3: generating actions ...
|
||||||
|
>>> Step 3: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 4: generating actions ...
|
||||||
|
>>> Step 4: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 5: generating actions ...
|
||||||
|
>>> Step 5: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 6: generating actions ...
|
||||||
|
>>> Step 6: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 7: generating actions ...
|
||||||
>>> Step 7: interacting with world model ...
|
>>> Step 7: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing PpmImagePlugin
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
DEBUG:PIL.Image:Importing PsdImagePlugin
|
>>> Step 8: generating actions ...
|
||||||
DEBUG:PIL.Image:Importing QoiImagePlugin
|
>>> Step 8: interacting with world model ...
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
{
|
{
|
||||||
"gt_video": "unitree_z1_stackbox/case2/unitree_z1_stackbox_case2.mp4",
|
"gt_video": "unitree_z1_stackbox/case2/unitree_z1_stackbox_case2.mp4",
|
||||||
"pred_video": "unitree_z1_stackbox/case2/output/inference/15_full_fs4.mp4",
|
"pred_video": "unitree_z1_stackbox/case2/output/inference/15_full_fs4.mp4",
|
||||||
"psnr": 38.94694381287429
|
"psnr": 48.64571989587276
|
||||||
}
|
}
|
||||||
@@ -20,6 +20,5 @@ dataset="unitree_z1_stackbox"
|
|||||||
--n_iter 12 \
|
--n_iter 12 \
|
||||||
--timestep_spacing 'uniform_trailing' \
|
--timestep_spacing 'uniform_trailing' \
|
||||||
--guidance_rescale 0.7 \
|
--guidance_rescale 0.7 \
|
||||||
--perframe_ae \
|
--perframe_ae
|
||||||
--fast_policy_no_decode
|
|
||||||
} 2>&1 | tee "${res_dir}/output.log"
|
} 2>&1 | tee "${res_dir}/output.log"
|
||||||
|
|||||||
@@ -1,16 +1,21 @@
|
|||||||
2026-02-11 18:53:57.068615: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
2026-02-11 23:51:59.440508: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
||||||
2026-02-11 18:53:57.118271: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
2026-02-11 23:51:59.489849: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
||||||
2026-02-11 18:53:57.118312: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
2026-02-11 23:51:59.489897: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
||||||
2026-02-11 18:53:57.119665: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
2026-02-11 23:51:59.491194: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
||||||
2026-02-11 18:53:57.127266: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
2026-02-11 23:51:59.498659: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
||||||
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
||||||
2026-02-11 18:53:58.042116: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
2026-02-11 23:52:00.422248: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
||||||
Global seed set to 123
|
Global seed set to 123
|
||||||
>>> Loading prepared model from ckpts/unifolm_wma_dual.ckpt.prepared.pt ...
|
INFO:mainlogger:LatentVisualDiffusion: Running in v-prediction mode
|
||||||
>>> Prepared model loaded.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Diffusion backbone (model.model) converted to FP16.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Projectors (image_proj_model, state_projector, action_projector) converted to FP16.
|
AE working on z of shape (1, 4, 32, 32) = 4096 dimensions.
|
||||||
>>> Encoders (cond_stage_model, embedder) converted to FP16.
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
>>> model checkpoint loaded.
|
||||||
|
>>> Load pre-trained model ...
|
||||||
INFO:root:***** Configing Data *****
|
INFO:root:***** Configing Data *****
|
||||||
>>> unitree_z1_stackbox: 1 data samples loaded.
|
>>> unitree_z1_stackbox: 1 data samples loaded.
|
||||||
>>> unitree_z1_stackbox: data stats loaded.
|
>>> unitree_z1_stackbox: data stats loaded.
|
||||||
@@ -28,49 +33,13 @@ INFO:root:***** Configing Data *****
|
|||||||
>>> unitree_g1_pack_camera: data stats loaded.
|
>>> unitree_g1_pack_camera: data stats loaded.
|
||||||
>>> unitree_g1_pack_camera: normalizer initiated.
|
>>> unitree_g1_pack_camera: normalizer initiated.
|
||||||
>>> Dataset is successfully loaded ...
|
>>> Dataset is successfully loaded ...
|
||||||
✓ KV fused: 66 attention layers
|
|
||||||
>>> Generate 16 frames under each generation ...
|
>>> Generate 16 frames under each generation ...
|
||||||
DEBUG:h5py._conv:Creating converter from 3 to 5
|
DEBUG:h5py._conv:Creating converter from 3 to 5
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
||||||
|
|
||||||
0%| | 0/12 [00:00<?, ?it/s]
|
|
||||||
0%| | 0/12 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
0%| | 0/12 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
||||||
17%|█▋ | 2/12 [00:46<03:50, 23.02s/it]
|
|
||||||
25%|██▌ | 3/12 [01:09<03:26, 22.96s/it]
|
|
||||||
33%|███▎ | 4/12 [01:31<03:03, 22.92s/it]
|
|
||||||
42%|████▏ | 5/12 [01:54<02:40, 22.87s/it]
|
|
||||||
50%|█████ | 6/12 [02:17<02:17, 22.85s/it]
|
|
||||||
58%|█████▊ | 7/12 [02:40<01:54, 22.83s/it]
|
|
||||||
67%|██████▋ | 8/12 [03:03<01:31, 22.80s/it]
|
|
||||||
75%|███████▌ | 9/12 [03:25<01:08, 22.78s/it]
|
|
||||||
83%|████████▎ | 10/12 [03:48<00:45, 22.77s/it]
|
|
||||||
92%|█████████▏| 11/12 [04:11<00:22, 22.76s/it]
|
|
||||||
100%|██████████| 12/12 [04:33<00:00, 22.75s/it]
|
|
||||||
100%|██████████| 12/12 [04:33<00:00, 22.83s/it]
|
|
||||||
>>> Step 0: generating actions ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 1: generating actions ...
|
|
||||||
>>> Step 1: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 2: generating actions ...
|
|
||||||
>>> Step 2: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 3: generating actions ...
|
|
||||||
>>> Step 3: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 4: generating actions ...
|
|
||||||
>>> Step 4: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 5: generating actions ...
|
|
||||||
>>> Step 5: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 6: generating actions ...
|
|
||||||
>>> Step 6: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 7: generating actions ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
>>> Step 0: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing BlpImagePlugin
|
DEBUG:PIL.Image:Importing BlpImagePlugin
|
||||||
DEBUG:PIL.Image:Importing BmpImagePlugin
|
DEBUG:PIL.Image:Importing BmpImagePlugin
|
||||||
@@ -120,7 +89,42 @@ DEBUG:PIL.Image:Importing WmfImagePlugin
|
|||||||
DEBUG:PIL.Image:Importing WmfImagePlugin
|
DEBUG:PIL.Image:Importing WmfImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XbmImagePlugin
|
DEBUG:PIL.Image:Importing XbmImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XpmImagePlugin
|
DEBUG:PIL.Image:Importing XpmImagePlugin
|
||||||
|
DEBUG:PIL.Image:Importing XVThumbImagePlugin
|
||||||
|
|
||||||
|
8%|▊ | 1/12 [01:14<13:34, 74.06s/it]
|
||||||
|
17%|█▋ | 2/12 [02:27<12:17, 73.74s/it]
|
||||||
|
25%|██▌ | 3/12 [03:40<10:59, 73.23s/it]
|
||||||
|
33%|███▎ | 4/12 [04:52<09:43, 72.94s/it]
|
||||||
|
42%|████▏ | 5/12 [06:05<08:29, 72.81s/it]
|
||||||
|
50%|█████ | 6/12 [07:17<07:16, 72.74s/it]
|
||||||
|
58%|█████▊ | 7/12 [08:30<06:03, 72.68s/it]
|
||||||
|
67%|██████▋ | 8/12 [09:42<04:50, 72.63s/it]
|
||||||
|
75%|███████▌ | 9/12 [10:55<03:38, 72.72s/it]
|
||||||
|
83%|████████▎ | 10/12 [12:09<02:25, 72.86s/it]
|
||||||
|
92%|█████████▏| 11/12 [13:21<01:12, 72.75s/it]
|
||||||
|
100%|██████████| 12/12 [14:33<00:00, 72.66s/it]
|
||||||
|
100%|██████████| 12/12 [14:33<00:00, 72.83s/it]
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 1: generating actions ...
|
||||||
|
>>> Step 1: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 2: generating actions ...
|
||||||
|
>>> Step 2: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 3: generating actions ...
|
||||||
|
>>> Step 3: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 4: generating actions ...
|
||||||
|
>>> Step 4: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 5: generating actions ...
|
||||||
|
>>> Step 5: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 6: generating actions ...
|
||||||
|
>>> Step 6: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 7: generating actions ...
|
||||||
>>> Step 7: interacting with world model ...
|
>>> Step 7: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing PpmImagePlugin
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
DEBUG:PIL.Image:Importing PsdImagePlugin
|
>>> Step 8: generating actions ...
|
||||||
DEBUG:PIL.Image:Importing QoiImagePlugin
|
>>> Step 8: interacting with world model ...
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
{
|
{
|
||||||
"gt_video": "unitree_z1_stackbox/case3/unitree_z1_stackbox_case3.mp4",
|
"gt_video": "unitree_z1_stackbox/case3/unitree_z1_stackbox_case3.mp4",
|
||||||
"pred_video": "unitree_z1_stackbox/case3/output/inference/25_full_fs4.mp4",
|
"pred_video": "unitree_z1_stackbox/case3/output/inference/25_full_fs4.mp4",
|
||||||
"psnr": 49.489774674892764
|
"psnr": 45.127553229898034
|
||||||
}
|
}
|
||||||
@@ -20,6 +20,5 @@ dataset="unitree_z1_stackbox"
|
|||||||
--n_iter 12 \
|
--n_iter 12 \
|
||||||
--timestep_spacing 'uniform_trailing' \
|
--timestep_spacing 'uniform_trailing' \
|
||||||
--guidance_rescale 0.7 \
|
--guidance_rescale 0.7 \
|
||||||
--perframe_ae \
|
--perframe_ae
|
||||||
--fast_policy_no_decode
|
|
||||||
} 2>&1 | tee "${res_dir}/output.log"
|
} 2>&1 | tee "${res_dir}/output.log"
|
||||||
|
|||||||
@@ -1,16 +1,21 @@
|
|||||||
2026-02-11 18:59:09.688302: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
2026-02-12 00:08:05.532356: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
|
||||||
2026-02-11 18:59:09.737473: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
2026-02-12 00:08:05.582134: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
|
||||||
2026-02-11 18:59:09.737518: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
2026-02-12 00:08:05.582180: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
|
||||||
2026-02-11 18:59:09.738835: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
2026-02-12 00:08:05.583503: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
|
||||||
2026-02-11 18:59:09.746322: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
2026-02-12 00:08:05.591049: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
||||||
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
||||||
2026-02-11 18:59:10.660940: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
2026-02-12 00:08:06.523218: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
||||||
Global seed set to 123
|
Global seed set to 123
|
||||||
>>> Loading prepared model from ckpts/unifolm_wma_dual.ckpt.prepared.pt ...
|
INFO:mainlogger:LatentVisualDiffusion: Running in v-prediction mode
|
||||||
>>> Prepared model loaded.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Diffusion backbone (model.model) converted to FP16.
|
INFO:unifolm_wma.models.diffusion_head.conditional_unet1d:number of parameters: 5.010531e+08
|
||||||
>>> Projectors (image_proj_model, state_projector, action_projector) converted to FP16.
|
AE working on z of shape (1, 4, 32, 32) = 4096 dimensions.
|
||||||
>>> Encoders (cond_stage_model, embedder) converted to FP16.
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
INFO:root:Loaded ViT-H-14 model config.
|
||||||
|
INFO:root:Loading pretrained ViT-H-14 weights (laion2b_s32b_b79k).
|
||||||
|
>>> model checkpoint loaded.
|
||||||
|
>>> Load pre-trained model ...
|
||||||
INFO:root:***** Configing Data *****
|
INFO:root:***** Configing Data *****
|
||||||
>>> unitree_z1_stackbox: 1 data samples loaded.
|
>>> unitree_z1_stackbox: 1 data samples loaded.
|
||||||
>>> unitree_z1_stackbox: data stats loaded.
|
>>> unitree_z1_stackbox: data stats loaded.
|
||||||
@@ -28,49 +33,13 @@ INFO:root:***** Configing Data *****
|
|||||||
>>> unitree_g1_pack_camera: data stats loaded.
|
>>> unitree_g1_pack_camera: data stats loaded.
|
||||||
>>> unitree_g1_pack_camera: normalizer initiated.
|
>>> unitree_g1_pack_camera: normalizer initiated.
|
||||||
>>> Dataset is successfully loaded ...
|
>>> Dataset is successfully loaded ...
|
||||||
✓ KV fused: 66 attention layers
|
|
||||||
>>> Generate 16 frames under each generation ...
|
>>> Generate 16 frames under each generation ...
|
||||||
DEBUG:h5py._conv:Creating converter from 3 to 5
|
DEBUG:h5py._conv:Creating converter from 3 to 5
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
DEBUG:PIL.PngImagePlugin:STREAM b'IHDR' 16 13
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
DEBUG:PIL.PngImagePlugin:STREAM b'pHYs' 41 9
|
||||||
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
DEBUG:PIL.PngImagePlugin:STREAM b'IDAT' 62 4096
|
||||||
|
|
||||||
0%| | 0/12 [00:00<?, ?it/s]
|
|
||||||
0%| | 0/12 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
0%| | 0/12 [00:00<?, ?it/s]>>> Step 0: generating actions ...
|
||||||
17%|█▋ | 2/12 [00:46<03:50, 23.01s/it]
|
|
||||||
25%|██▌ | 3/12 [01:09<03:26, 22.96s/it]
|
|
||||||
33%|███▎ | 4/12 [01:31<03:03, 22.92s/it]
|
|
||||||
42%|████▏ | 5/12 [01:54<02:40, 22.86s/it]
|
|
||||||
50%|█████ | 6/12 [02:17<02:16, 22.82s/it]
|
|
||||||
58%|█████▊ | 7/12 [02:40<01:53, 22.79s/it]
|
|
||||||
67%|██████▋ | 8/12 [03:02<01:31, 22.77s/it]
|
|
||||||
75%|███████▌ | 9/12 [03:25<01:08, 22.77s/it]
|
|
||||||
83%|████████▎ | 10/12 [03:48<00:45, 22.78s/it]
|
|
||||||
92%|█████████▏| 11/12 [04:11<00:22, 22.77s/it]
|
|
||||||
100%|██████████| 12/12 [04:33<00:00, 22.77s/it]
|
|
||||||
100%|██████████| 12/12 [04:33<00:00, 22.83s/it]
|
|
||||||
>>> Step 0: generating actions ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 1: generating actions ...
|
|
||||||
>>> Step 1: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 2: generating actions ...
|
|
||||||
>>> Step 2: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 3: generating actions ...
|
|
||||||
>>> Step 3: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 4: generating actions ...
|
|
||||||
>>> Step 4: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 5: generating actions ...
|
|
||||||
>>> Step 5: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 6: generating actions ...
|
|
||||||
>>> Step 6: interacting with world model ...
|
|
||||||
>>>>>>>>>>>>>>>>>>>>>>>>
|
|
||||||
>>> Step 7: generating actions ...
|
|
||||||
>>> Step 0: interacting with world model ...
|
>>> Step 0: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing BlpImagePlugin
|
DEBUG:PIL.Image:Importing BlpImagePlugin
|
||||||
DEBUG:PIL.Image:Importing BmpImagePlugin
|
DEBUG:PIL.Image:Importing BmpImagePlugin
|
||||||
@@ -120,7 +89,42 @@ DEBUG:PIL.Image:Importing WmfImagePlugin
|
|||||||
DEBUG:PIL.Image:Importing WmfImagePlugin
|
DEBUG:PIL.Image:Importing WmfImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XbmImagePlugin
|
DEBUG:PIL.Image:Importing XbmImagePlugin
|
||||||
DEBUG:PIL.Image:Importing XpmImagePlugin
|
DEBUG:PIL.Image:Importing XpmImagePlugin
|
||||||
|
DEBUG:PIL.Image:Importing XVThumbImagePlugin
|
||||||
|
|
||||||
|
8%|▊ | 1/12 [01:13<13:30, 73.69s/it]
|
||||||
|
17%|█▋ | 2/12 [02:26<12:12, 73.23s/it]
|
||||||
|
25%|██▌ | 3/12 [03:40<11:02, 73.59s/it]
|
||||||
|
33%|███▎ | 4/12 [04:54<09:50, 73.76s/it]
|
||||||
|
42%|████▏ | 5/12 [06:07<08:34, 73.51s/it]
|
||||||
|
50%|█████ | 6/12 [07:20<07:19, 73.19s/it]
|
||||||
|
58%|█████▊ | 7/12 [08:32<06:04, 72.98s/it]
|
||||||
|
67%|██████▋ | 8/12 [09:45<04:51, 72.85s/it]
|
||||||
|
75%|███████▌ | 9/12 [10:57<03:38, 72.76s/it]
|
||||||
|
83%|████████▎ | 10/12 [12:10<02:25, 72.69s/it]
|
||||||
|
92%|█████████▏| 11/12 [13:23<01:12, 72.66s/it]
|
||||||
|
100%|██████████| 12/12 [14:35<00:00, 72.63s/it]
|
||||||
|
100%|██████████| 12/12 [14:35<00:00, 72.97s/it]
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 1: generating actions ...
|
||||||
|
>>> Step 1: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 2: generating actions ...
|
||||||
|
>>> Step 2: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 3: generating actions ...
|
||||||
|
>>> Step 3: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 4: generating actions ...
|
||||||
|
>>> Step 4: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 5: generating actions ...
|
||||||
|
>>> Step 5: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 6: generating actions ...
|
||||||
|
>>> Step 6: interacting with world model ...
|
||||||
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
|
>>> Step 7: generating actions ...
|
||||||
>>> Step 7: interacting with world model ...
|
>>> Step 7: interacting with world model ...
|
||||||
DEBUG:PIL.Image:Importing PpmImagePlugin
|
>>>>>>>>>>>>>>>>>>>>>>>>
|
||||||
DEBUG:PIL.Image:Importing PsdImagePlugin
|
>>> Step 8: generating actions ...
|
||||||
DEBUG:PIL.Image:Importing QoiImagePlugin
|
>>> Step 8: interacting with world model ...
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
{
|
{
|
||||||
"gt_video": "unitree_z1_stackbox/case4/unitree_z1_stackbox_case4.mp4",
|
"gt_video": "unitree_z1_stackbox/case4/unitree_z1_stackbox_case4.mp4",
|
||||||
"pred_video": "unitree_z1_stackbox/case4/output/inference/35_full_fs4.mp4",
|
"pred_video": "unitree_z1_stackbox/case4/output/inference/35_full_fs4.mp4",
|
||||||
"psnr": 47.18724378194084
|
"psnr": 50.642542240144444
|
||||||
}
|
}
|
||||||
@@ -20,6 +20,5 @@ dataset="unitree_z1_stackbox"
|
|||||||
--n_iter 12 \
|
--n_iter 12 \
|
||||||
--timestep_spacing 'uniform_trailing' \
|
--timestep_spacing 'uniform_trailing' \
|
||||||
--guidance_rescale 0.7 \
|
--guidance_rescale 0.7 \
|
||||||
--perframe_ae \
|
--perframe_ae
|
||||||
--fast_policy_no_decode
|
|
||||||
} 2>&1 | tee "${res_dir}/output.log"
|
} 2>&1 | tee "${res_dir}/output.log"
|
||||||
|
|||||||
Reference in New Issue
Block a user