Predictions Evaluation#
from pathlib import Path
import numpy as np
import pandas as pd
from utils.smooth_bleu import bleu_fromstr
def analyze_preds(base_file, sample_size=5):
# read files
hf_preds_file = Path(base_file).with_suffix('.hf_pred.csv')
fine_tuned_file = Path(base_file).with_suffix('.finetuned_pred.csv')
hf_preds = pd.read_csv(hf_preds_file)
fine_tuned = pd.read_csv(fine_tuned_file)
# put in df
df = pd.DataFrame({'code': fine_tuned['code'],
'hf_pred': hf_preds['prediction'],
'fine_tuned_pred': fine_tuned['prediction'],
'target': fine_tuned['target']})
df.replace(np.nan, '', regex=True)
# print sample with predictions
sample = df.sample(sample_size, random_state=42)
for code, hf_pred, fine_tuned_pred, target in sample.to_numpy():
print('-------------------')
print(code)
print(f'## Human Review: {target}')
print(f'## HF Pred: {hf_pred}')
print(f'## Fine Tuned Pred: {fine_tuned_pred}')
return df
def calc_bleu(df):
refs = list(df['target'])
preds = list(df['prediction'])
for i in range(len(preds)):
chars = "(_)`."
for c in chars:
preds[i] = preds[i].replace(c, " " + c + " ")
preds[i] = " ".join(preds[i].split())
refs[i] = refs[i].replace(c, " " + c + " ")
refs[i] = " ".join(refs[i].split())
return bleu_fromstr(preds, refs, rmstop=False)
def calc_bleu_score(base_file):
hf_preds_file = Path(base_file).with_suffix('.hf_pred.csv')
fine_tuned_file = Path(base_file).with_suffix('.finetuned_pred.csv')
hf_preds = pd.read_csv(hf_preds_file)
ft_preds = pd.read_csv(fine_tuned_file)
hf_preds.replace(np.nan, '', regex=True, inplace=True)
ft_preds.replace(np.nan, '', regex=True, inplace=True)
hf_bleu = calc_bleu(hf_preds)
ft_bleu = calc_bleu(ft_preds)
print(f'HF BLEU: {hf_bleu}')
print(f'Fine Tuned BLEU: {ft_bleu}')
return hf_bleu, ft_bleu
Qualitative Evaluation#
We will now compare the predictions of the HF model and the fine-tuned model on samples of the four datasets.
We will print the code, the prediction of the HF model and the prediction of the fine-tuned model.
df = {}
df['msg'] = analyze_preds('../data/msg-test')
-------------------
@@ -1,6 +1,7 @@
# frozen_string_literal: true
require 'hocon'
+require 'bolt/error'
class TransportConfig
attr_accessor :host, :port, :ssl_cert, :ssl_key, :ssl_ca_cert, :ssl_cipher_suites,
## Human Review: Looks like this isn't used in this file?
## HF Pred: Why do we need this?
## Fine Tuned Pred: I am surprised you don't use `hocon` here.
-------------------
@@ -92,7 +92,7 @@ public final class OAuth2AuthorizedClientArgumentResolver implements HandlerMeth
clientRegistrationId.flatMap(id -> Mono.error(new IllegalStateException(
"Unable to resolve the Authorized Client with registration identifier \""
+ id
- + "\". An \"authenticated\" or \"unauthenticated\" session is required. To allow for unauthenticated access, ensure ServerHttpSecurity.anonymous() is configured."))))
+ + "\". An \"authenticated\" or \"anonymous\" request is required. To allow for anonymous access, ensure ServerHttpSecurity.anonymous() is configured."))))
.flatMap(zipped -> {
String registrationId = zipped.getT1();
String username = zipped.getT2();
## Human Review: I don't think this should be changed since on the reactive side we don't support anonymous users.
## HF Pred: I don't think we should change this message. It's not clear to me why this is an error.
## Fine Tuned Pred: I think this message should be updated. `An "anonymous" request is required. To allow for anonymous access, ServerHttpSecurity.anonymous() is configured.`
-------------------
@@ -3,7 +3,7 @@ const test = require('./shared').assert,
setupDatabase = require('./shared').setupDatabase,
Script = require('vm'),
expect = require('chai').expect,
- normalizedFunctionString = require('bson/lib/bson/parser/utils').normalizedFunctionString,
+ normalizedFunctionString = require('bson/lib/parser/utils').normalizedFunctionString,
Buffer = require('safe-buffer').Buffer;
const {
## Human Review: feel free when editing sections like this to introduce modern features like object destructuring.
## HF Pred: Why this change?
## Fine Tuned Pred: whats with the whitespace changes in this file?
-------------------
@@ -0,0 +1,17 @@
+<?php
+/**
+ * Copyright © Bold Brand Commerce Sp. z o.o. All rights reserved.
+ * See LICENSE.txt for license details.
+ */
+
+declare(strict_types=1);
+
+namespace Ergonode\Core\Infrastructure\Exception;
+
+class SerializationException extends SerializerException
+{
+ public function __construct(string $message, \Throwable $previous = null)
+ {
+ parent::__construct($message, $previous);
+ }
+}
## Human Review: I think it should been in `SharedKernel` module.
## HF Pred: Missing license header.
## Fine Tuned Pred: Why do we need a separate exception class here? Seems very specific to Elgg\Core\Infrastructure\Exception
-------------------
@@ -473,6 +473,7 @@ describe('GridFS Stream', function () {
// Fail if user tries to abort an aborted stream
uploadStream.abort().then(null, function (error) {
expect(error.toString()).to.equal(
+ // TODO(NODE-3405): Replace with MongoStreamClosedError
'MongoDriverError: Cannot call abort() on a stream twice'
);
client.close(done);
## Human Review: You can remove these if they've been resolved in NODE-3405 and this isn't depending on it
## HF Pred: Is this TODO still relevant?
## Fine Tuned Pred: What is the problem here?
df['vscode'] = analyze_preds('../data/microsoft_vscode_1000.csv')
-------------------
@@ -340,6 +341,17 @@ export class DebugEditorContribution implements IDebugEditorContribution {
return new RunOnceScheduler(() => this.hoverWidget.hide(), HOVER_DELAY);
}
+ @memoize
+ private get provideNonDebugHoverScheduler(): RunOnceScheduler {
+ return new RunOnceScheduler(() => {
## Human Review: Where is this disposed?
## HF Pred: Why is this memoized?
## Fine Tuned Pred: This is not `Runnable` but `RunOnceScheduler`, right?
-------------------
@@ -53,6 +53,11 @@ export class SassParser extends cssParser.Parser {
// Sass variables: $font-size: 12px;
public _parseVariableDeclaration(panic:scanner.TokenType[]=[]): nodes.VariableDeclaration {
+ var cssVariableDeclaration= super._parseCssVariableDeclaration(panic);
## Human Review: That looks wrong. Not all places where you can declare a sass variable are also suited to declare a css variable.
## HF Pred: Nit: space after `=`
## Fine Tuned Pred: Minor style issue: missing space before the `=`
-------------------
@@ -537,6 +537,23 @@ export interface CompletionList {
*/
items: CompletionItem[];
}
+
+/**
+ * Contains additional information about the context in which
+ * [completion provider](#CompletionItemProvider.provideCompletionItems) is triggered.
+ */
+export interface CompletionContext {
## Human Review: This also needs the following information *manual* invocation, *24x7* IntelliSense, completing incomplete result set
## HF Pred: I don't think we need this interface. We can just use `CompletionItemProvider`.
## Fine Tuned Pred: Is there a reason this is not an object?
-------------------
@@ -439,6 +441,16 @@ export class DebugService implements debug.IDebugService {
});
}
+ private debouncedDisplayThreads(session: RawDebugSession) {
+ const timer = this.displayThreadsTimer.get(session.getId());
+ if (timer) {
+ clearTimeout(timer);
+ }
+ this.displayThreadsTimer.set(session.getId(), setTimeout(() => {
+ this.fetchThreads(session).done(undefined, errors.onUnexpectedError);
## Human Review: This will make
## HF Pred: I'm not sure if this is the best way to do this. I think it would be better to do this in the `fetchThreads` function.
## Fine Tuned Pred: Is there a reason we can't just use `clearTimeout` here? Is there any difference in the code between `clearTimeout` and `set`?
-------------------
@@ -265,4 +266,33 @@ class MDDocumentContentProvider implements TextDocumentContentProvider {
}, 300);
}
}
+}
+
+class DocumentHeadingsProvider implements vscode.DocumentSymbolProvider {
+
+ // http://daringfireball.net/projects/markdown/syntax#header
+ private static _atxPattern = /^(#){1,6}\s+.+(\s+\1)?/;
+ private static _settext = /^\s*[-=]+\s*$/;
## Human Review: Make sure to also test that this does not pick up separators
## HF Pred: nit: `_atxPattern` -> `_atxRegex`
## Fine Tuned Pred: Remove this and use directly the `SymbolProvider` below.
df['kotlin'] = analyze_preds('../data/JetBrains_kotlin_1000.csv')
-------------------
@@ -24,10 +24,13 @@ abstract class AbstractIrLineNumberTest : AbstractLineNumberTest() {
override fun compareCustom(psiFile: KtFile, wholeFile: File) {
val fileText = psiFile.text
- val expectedLineNumbers = normalize(
- fileText.substring(fileText.indexOf("//") + 2)
- .trim().split(" ").map { it.trim() }.toMutableList()
- )
+ val expectedLineNumbers = fileText.split("\n".toRegex()).filter { line ->
## Human Review: How about simplifying this to:
val expectedLineNumbers = normalize(
fileText.substring(Regex("// \\d+").find(fileText)!!.range.start + 2)
.trim().split(" ").map { it.trim() }.toMutableList()
)
Then we are looking for a line that starts with "// " with a number of digits after it. That should work to exclude the comment lines with text. That looks consistent with what the old backend expects from these tests.
## HF Pred: Why did you change this?
## Fine Tuned Pred: Remove the `toRegex()` call here.
-------------------
@@ -45,7 +45,6 @@ import org.jetbrains.kotlin.types.KotlinType
import org.jetbrains.kotlin.types.typeUtil.isPrimitiveNumberType
import org.jetbrains.kotlin.types.typeUtil.makeNotNullable
import org.jetbrains.kotlin.types.typeUtil.makeNullable
-import java.lang.AssertionError
## Human Review: Pls remove this file from pr
## HF Pred: import org.jetbrains.kotlin.util.KotlinConstants;
## Fine Tuned Pred: Why did you remove this import?
-------------------
@@ -46,6 +46,7 @@ abstract class AbstractCheckLocalVariablesTableTest : CodegenTestCase() {
assertNotNull("Couldn't find class file for pattern $classFileRegex in: $pathsString", outputFile)
val actualLocalVariables = readLocalVariable(ClassReader(outputFile.asByteArray()), methodName)
+ checkLocalVariableTypes(ClassReader(outputFile.asByteArray()), methodName, actualLocalVariables)
## Human Review: Could we reuse class reader from previous line?
## HF Pred: assertEquals(expectedLocalVariables, actualLocalVariables)
## Fine Tuned Pred: What's the purpose of this line?
-------------------
@@ -37,3 +37,17 @@ public fun Char.equals(other: Char, ignoreCase: Boolean = false): Boolean {
* Returns `true` if this character is a Unicode surrogate code unit.
*/
public fun Char.isSurrogate(): Boolean = this in Char.MIN_SURROGATE..Char.MAX_SURROGATE
+
+/**
+ * Minimum value for character
+ * @see <a href="https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html">
## Human Review: I don't think this way of linking is supported in `@see` tag. I'm not sure what is the purpose of these links anyway.
## HF Pred: This should be `Char.isSurrogate()`.
## Fine Tuned Pred: Please remove the `<a>` tag.
-------------------
@@ -175,7 +195,9 @@ class ExpressionCodegen(
override fun visitBlockBody(body: IrBlockBody, data: BlockInfo): StackValue {
return body.statements.fold(none()) { _, exp ->
- exp.accept(this, data)
+ val result = exp.accept(this, data)
+ (exp as? IrExpression)?.markEndOfStatementIfNeeded()
## Human Review: ```
exp.accept(this, data).also {
(exp as? IrExpression)?.markEndOfStatementIfNeeded()
}
```
## HF Pred: Why is this needed?
## Fine Tuned Pred: Unrelated to this PR, but this doesn't seem like the best place to markEndOfStatementIfNeeded. Should the markEndOfStatementIfNeeded be in some other place, or should the markEndOfStatementIfNeeded be in BlockInfoResolver?
df['uppy'] = analyze_preds('../data/transloadit_uppy_1000.csv')
-------------------
@@ -28,6 +28,7 @@
"@uppy/thumbnail-generator": "0.29.1",
"@uppy/utils": "0.29.1",
"classnames": "^2.2.6",
+ "cuid": "^2.1.1",
"drag-drop": "2.13.3",
## Human Review: In other places we are using uuid, I think, is there a reason to use different packages? I was actually gonna suggest we switch to https://github.com/ai/nanoid everywhere, it’s smaller and ids are smaller.
## HF Pred: This should be `^2.1.0`, right?
## Fine Tuned Pred: Why are we using a fixed version here?
-------------------
@@ -49,7 +49,8 @@ class Tabs extends Component {
tabindex="-1"
type="file"
name="files[]"
- multiple="true"
+ multiple={this.props.maxNumberOfFiles !== 1 || !this.props.maxNumberOfFiles}
## Human Review: same here
## HF Pred: `multiple={this.props.maxNumberOfFiles !== 1}`
## Fine Tuned Pred: shouldnt it be only true if `this.props.maxNumberOfFiles` is defined ?
-------------------
@@ -496,6 +494,25 @@ module.exports = class ProviderView {
}
}
+ listAllFiles (path, files = null) {
+ files = files || []
+ return new Promise((resolve) => {
+ this.provider.list(path).then((res) => {
## Human Review: should we add a `catch` for this call. Otherwise what will happen when the call to `this.provider.list` fails?
## HF Pred: `files = files || []`
## Fine Tuned Pred: `resolve` is unnecessary here, you can just call `this.provider.list(path, files)`
-------------------
@@ -422,7 +422,7 @@ module.exports = class ProviderView {
}
handleAuth () {
- const authState = btoa(JSON.stringify({ origin: location.origin }))
+ const authState = btoa(JSON.stringify({ origin: location.origin, redirect: 'http://localhost:3' }))
const link = `${this.provider.authUrl()}?state=${authState}`
## Human Review: `http://localhost:3` 🤔
## HF Pred: I think this should be `https://localhost:3`
## Fine Tuned Pred: I think it would be better if we could use `location.protocol` here instead of hardcoding `localhost:3`
-------------------
@@ -2,6 +2,14 @@
echo "Preparing for end to end test: copying static HTML and CSS, building JS"
rm -rf ./test/endtoend/dist && mkdir ./test/endtoend/dist
-cp ./dist/uppy.min.css ./test/endtoend/dist
-cp ./test/endtoend/src/index.html ./test/endtoend/dist
+rm -rf ./test/endtoend/node_modules
+
+UPPY_VERSION=$(cat package.json | grep version | head -1 | awk -F= "{ print $2 }" | sed 's/[version:,\",]//g' | tr -d '[[:space:]]')
+# archive the uppy package
+npm pack
## Human Review: I think we should also run the `npm run prepublishOnly` script first here, so that the package is exactly the same as the published one (I thought this would happen automatically inside `npm pack` but it doesn't seem to)
## HF Pred: Why do we need this?
## Fine Tuned Pred: Do we want to keep the old version or is there a way to upgrade to a newer version?
As we can see, the models are pretty mediocre at understanding the issue. But I find that the HF model tends to produce more generic predictions, while the fine-tuned model produces predictions that are more specific to the code and shows a better understanding.
Still, both models predict something sensible but struggle to pin-point the problem.
Quantitative Evaluation#
For each dataset, we calculate the BLEU-4 score for the predictions of the HF model and the fine-tuned model. The BLEU score is a measure of how similar the predictions are to the target. The higher the score, the better the predictions.
calc_bleu_score('../data/msg-test')
Total: 10169
Total: 10169
HF BLEU: 5.14
Fine Tuned BLEU: 5.34
(5.14, 5.34)
calc_bleu_score('../data/microsoft_vscode_1000.csv')
Total: 1000
Total: 1000
HF BLEU: 3.39
Fine Tuned BLEU: 4.02
(3.39, 4.02)
calc_bleu_score('../data/JetBrains_kotlin_1000.csv')
Total: 1000
HF BLEU: 3.39
Fine Tuned BLEU: 4.32
Total: 1000
(3.39, 4.32)
calc_bleu_score('../data/transloadit_uppy_1000.csv')
Total: 1000
HF BLEU: 4.55
Fine Tuned BLEU: 4.84
Total: 1000
(4.55, 4.84)
As we can see, the fine-tuned model performs slightly better than the HF model on all datasets.
Nevertheless, the score is still pretty low (as the authors of [LLG+22] put it: “it is a hard task”).