Leaderboard
In this new leaderboard, hyperparameter spaces are larger, and all hyperparameters are selected according to 3-run average performance. All results are obtained by 10 runs. Values in parenthesis are standard deviations. -(-) denotes abnormal results caused by under-fitting.
Graph-level
GOOD-HIV |
scaffold |
size |
||
---|---|---|---|---|
covariate |
concept |
covariate |
concept |
|
ERM |
69.55(2.39) |
72.48(1.26) |
59.19(2.29) |
61.91(2.29) |
IRM |
70.17(2.78) |
71.78(1.37) |
59.94(1.59) |
-(-) |
VREx |
69.34(3.54) |
72.21(1.42) |
58.49(2.28) |
61.21(2.00) |
GroupDRO |
68.15(2.84) |
71.48(1.27) |
57.75(2.86) |
59.77(1.95) |
Coral |
70.69(2.25) |
72.96(1.06) |
59.39(2.90) |
60.29(2.50) |
DANN |
69.43(2.42) |
71.70(0.90) |
62.38(2.65) |
65.15(3.13) |
Mixup |
70.65(1.86) |
71.89(1.73) |
59.11(3.11) |
62.80(2.43) |
DIR |
68.44(2.51) |
71.40(1.48) |
57.67(3.75) |
74.39(1.45) |
GSAT |
70.07(1.76) |
72.51(0.97) |
60.73(2.39) |
56.96(1.76) |
GOOD-Motif |
basis |
size |
||
---|---|---|---|---|
covariate |
concept |
covariate |
concept |
|
ERM |
63.80(10.36) |
81.31(0.69) |
53.46(4.08) |
70.83(0.79) |
IRM |
59.93(11.46) |
80.37(0.80) |
53.68(4.11) |
70.15(0.64) |
VREx |
66.53(4.04) |
81.34(0.75) |
54.47(3.42) |
70.58(1.16) |
GroupDRO |
61.96(8.27) |
81.00(0.60) |
51.69(2.22) |
70.35(0.40) |
Coral |
66.23(9.01) |
81.47(0.49) |
53.71(2.75) |
70.52(0.59) |
DANN |
51.54(7.28) |
81.43(0.60) |
51.86(2.44) |
70.74(0.65) |
Mixup |
69.67(5.86) |
77.64(0.58) |
51.31(2.56) |
68.21(0.89) |
DIR |
39.99(5.50) |
82.96(4.47) |
44.83(4.00) |
54.96(9.32) |
GSAT |
55.13(5.41) |
75.30(1.57) |
60.76(5.94) |
59.00(3.42) |
GOOD-CMNIST |
color |
|
---|---|---|
covariate |
concept |
|
ERM |
27.82(3.24) |
42.90(0.67) |
IRM |
29.04(2.10) |
42.73(0.71) |
VREx |
27.65(2.31) |
43.22(0.64) |
GroupDRO |
29.23(2.12) |
43.33(0.67) |
Coral |
29.47(3.15) |
42.98(0.59) |
DANN |
28.77(1.49) |
42.84(0.61) |
Mixup |
28.30(1.74) |
40.70(0.56) |
DIR |
26.20(4.48) |
28.71(4.66) |
GSAT |
35.62(5.52) |
47.58(1.15) |