Commits

Commits on Mar 12, 2024

cli_evaluate calls simple_evaluate with the same verbosity. (EleutherAI#1563 )
Wongboo
committedMar 12, 2024

Commits on Mar 11, 2024

AGIEval (EleutherAI#1359 )

haileyschoelkopf
and
Sparkier
committedMar 11, 2024
add Arabic EXAMS benchmark (EleutherAI#1498 )

khalil-Hennara
and
lintangsutawika
committedMar 11, 2024
Update ifeval.yaml (EleutherAI#1506 )
haileyschoelkopf
committedMar 11, 2024
Update generate_until_template_yaml (EleutherAI#1546 )
haileyschoelkopf
committedMar 11, 2024

Commits on Mar 10, 2024

Support jinja templating for task descriptions (EleutherAI#1553 )

HishamYahya
and
haileyschoelkopf
committedMar 10, 2024

Commits on Mar 9, 2024

Fix incorrect max_gen_toks generation kwarg default in code2_text. (EleutherAI#1551 )
cosmo3769
committedMar 9, 2024
Add compatibility for vLLM's new Logprob object (EleutherAI#1549 )

Yard1
and
haileyschoelkopf
committedMar 9, 2024

Commits on Mar 6, 2024

Update installation commands in openai_completions.py and contributing document and, update wandb_args description (EleutherAI#1536 )

naem1023
and
haileyschoelkopf
committedMar 6, 2024
Cleanup and fixes (Task, Instance, and a little bit of *evaluate) (EleutherAI#1533 )

LSinev
and
haileyschoelkopf
committedMar 6, 2024
update printed num-fewshot ; prevent fewshots from erroneously being used by cot which hardcodes fewshot prompt (EleutherAI#1502 )
haileyschoelkopf
committedMar 6, 2024
Update docs on LM.loglikelihood_rolling abstract method (EleutherAI#1532 )
haileyschoelkopf
committedMar 6, 2024
Adding new task : KorMedMCQA (EleutherAI#1530 )
sean0042
committedMar 6, 2024
Add WMDP Multiple-choice (EleutherAI#1534 )

justinphan3110
and
lintangsutawika
committedMar 6, 2024
Add EQ-Bench as per EleutherAI#1459 (EleutherAI#1511 )
pbevan1
committedMar 6, 2024

Commits on Mar 5, 2024

Add a new task GPQA (the part CoT and generative) (EleutherAI#1482 )

uanu2002
and
haileyschoelkopf
committedMar 5, 2024
Openllm benchmark (EleutherAI#1526 )
baberabb
committedMar 5, 2024

Commits on Mar 4, 2024

Fix minor edge cases (EleutherAI#951 EleutherAI#1503 ) (EleutherAI#1520 )
haileyschoelkopf
committedMar 4, 2024
Hotfix: fix TypeError in --trust_remote_code (EleutherAI#1517 )
haileyschoelkopf
committedMar 4, 2024
French Bench (EleutherAI#1500 )

ManuelFay
and
haileyschoelkopf
committedMar 4, 2024
Cleaning up unused unit tests (EleutherAI#1516 )
veekaybee
committedMar 4, 2024

Commits on Mar 3, 2024

Setting trust_remote_code to True for HuggingFace datasets compatibility (EleutherAI#1487 )
veekaybee
committedMar 3, 2024
Vllm update DP+TP (EleutherAI#1508 )
baberabb
committedMar 3, 2024

Commits on Mar 1, 2024

Commits on Feb 28, 2024

fix duplicated kwargs in some model init (EleutherAI#1495 )
lchu-ibm
committedFeb 28, 2024

Commits on Feb 27, 2024

Fix AttributeError in huggingface.py When 'model_type' is Missing (EleutherAI#1489 )

richwardle
and
haileyschoelkopf
committedFeb 27, 2024
update name of val split in truthfulqa multilingual (EleutherAI#1488 )
haileyschoelkopf
committedFeb 27, 2024
add multilingual mmlu eval (EleutherAI#1484 )
jordane95
committedFeb 27, 2024
Refactor evaluater.evaluate (EleutherAI#1441 )

baberabb
and
haileyschoelkopf
committedFeb 27, 2024

Commits on Feb 26, 2024