OSS-Bench: Benchmark Generator for Coding LLMs
(Beta â development ongoing)
Metric I: Compilability
(Easy)
Metric I - Compilability
Compilability is one natural and common task in compiled langauges. Incompilability indicates syntax (e.g., wrong grammar) or semantic (e.g., use of undefined variables) errors.
Metric II: Functional Test (Medium)
Metric II - Functional Test
Software testing (e.g., unit tests, integration tests, end-to-end tests, regression tests) is broadly adopted in open-source software to ensure the functionality of code updates.
Metric III: Memory Safety (Difficult)
Metric III - Memory Safety
Memory safety is a fundamental security concern in compiled languages. Bugs such as buffer overflows and double-free errors are undeniably harmful and can be maliciously exploited.
Models | Param. Size | Compilability | Func. Test | Memory Safety | Delta (10%) | Final Score |
---|