Write Your Own Go Linters with Parser Package
This post is also posted on Mercari Engineering Blog as one of the entries on Merpay Advent Calendar 2019
What's a Linter? Why Do I Need One?
Linters are tools that help to improve code readability, consistency, and maintainability by catching code smell and anti-patterns. Linters play an important role in the development process by helping to reduce the time required for developers to review trivial issues such as typo, function length, and useless arguments. Linters are also useful when integrated into development processes like CI.
There are many powerful linters in the Go community like gofmt and dupl. There's also a curated list of useful linters by GolangCI. You probably won't need to use all of them, but introducing one or two might hugely boost your team's productivity. Since it's the main programming language in Merpay, we also encourage teams to integrate linters into their workflow based on requirements.
There are, however, situations where creating a custom linter could give you great value. For instance, if your company has a job interview assignment that requires a candidate to write benchmark tests for specific functions, reviewers would need to go through all cases using text search to see if they meet the criteria. Instead of doing this manually, it can be easily done with a custom linter that automatically looks for a benchmark test that includes a specific function call. Tools like this can save some time for reviewers from checking minor problems.
Wondering how to build one? Here's how.
Let's Get Started: The Basics
Before we can build something useful, we need to know how it works. For parsing Go source code, 3 packages are required: parser
, ast
, and token
. Basically, we'll need to break down the code into an abstract syntax tree before we can start an analysis. Each piece of code is treated as a node
. A node might represent a function declaration, a variable assignment, or simply a comment.
Let's say we have a file called "foo.go" with a basic Foo
function:
package foo
import "fmt"
func Foo() {
fmt.Println("Hello, Foo")
}
This file can be broken down into 3 parts: package, import, and a function. We'll retrieve the information for each part using the parser package as follows.
package main
import (
"fmt"
"go/parser"
"go/token"
)
func main() {
// Supposedly it's a subdirectory within the current directory
fpath := "./foo/foo.go"
fset := token.NewFileSet()
file, err := parser.ParseFile(fset, fpath, nil, parser.Mode(0))
if err != nil {
panic(err)
}
// A. The package name
fmt.Println("package name:", file.Name)
// B. The import
imports := file.Imports
// The type []*ImportSpec so we need to loop through the slice to
// properly go through each package being imported.
for _, i := range imports {
path := *i.Path
fmt.Println(path.Value)
}
// C. The function
//
// file.Decls returns a slice of declarations. Looping through it
// to retrieve the function declaration.
for _, decl := range file.Decls {
switch d := decl.(type) {
// There are many more other types but we only focus on FuncDecl here.
case *ast.FuncDecl:
fmt.Println("function declaration:", d.Name)
}
}
}
Running the program gives us this result:
> go run main.go
package name: foo
"fmt"
function declaration: &{<nil> <nil> Foo 0xc0000a6120 0xc00009a1e0}
When a file is parsed, you get a struct that contains a lot of information back. A detailed description can be found in the package documentation, but here's a highlight of what we have used:
type File struct {
Name *Ident // package name
Decls []Decl // top-level declarations; or nil
Imports []*ImportSpec // imports in this file
// other fields are hidden
}
The Name
field contains the name of the package. Imports
focuses only on the import section of the file, and lastly, Decls
contains everything else that is declared in the file, including but not limited to structs, variables, and functions.
Hands-On: Checking for Unnecessary Newlines
In the previous section, we quickly covered the fundamentals of ASTs in Go. However, the only way to learn to use it is to build a working library.
The objective here is to build a minimal linter that checks for unnecessary newlines at the beginning of a function. Once detected, our linter should print a message. Here is an example of a problematic file:
package foo
import "fmt"
// extra newline
func Foo() {
fmt.Println("Hello Foo")
}
// no extra newline
func Bar() {
fmt.Println("Hello Bar")
}
The program we are building should indicate the existence of the unnecessary newline at the beginning of the Foo
function, but not print any message about the Bar
function.
If we were to analyze the input as we did in the previous section, the code would not be readable because the structure would become very complicated. We do not want to stuff our implementation with a ridiculous amount of switch
statements. So, in order to do it in a more elegant way, we will introduce the use of ast.Visitor, which walks through all nodes in a file and builds a syntax tree. Within that tree, it is a lot easier to differentiate between nodes and tokens.
Using a visitor would look like the following:
package main
import (
"fmt"
"go/ast"
"go/parser"
"go/token"
)
func main() {
fpath := "./foo/foo.go"
fset := token.NewFileSet()
f, err := parser.ParseFile(fset, fpath, nil, parser.Mode(0))
if err != nil {
fmt.Println(err)
return
}
var v visitor
ast.Walk(v, f)
}
type visitor struct{}
// Visit function walks through each node in a file
func (v visitor) Visit(n ast.Node) ast.Visitor {
if n == nil {
return nil
}
fmt.Printf("%T\n", n)
return v
}
Once run, the result would be:
> go run main.go
*ast.File
*ast.Ident
*ast.GenDecl
*ast.ImportSpec
*ast.BasicLit
*ast.FuncDecl
*ast.Ident
*ast.FuncType
*ast.FieldList
*ast.BlockStmt
*ast.ExprStmt
*ast.CallExpr
*ast.SelectorExpr
*ast.Ident
*ast.Ident
*ast.BasicLit
By using the Visit
function, we will be able to see each node and its purpose. They are not very expressive by themselves so here is a picture showing which nodes we are using in this program:
Note that the fmt.Println
function call includes multiple nodes, but due to the fact that this program is only involved with newlines, most nodes are ignored.
Unfortunately, the node list does not contain an identifier for a newline, so our implementation will check the difference between:
1) The line number of the beginning of a block statement, i.e. "{".
2) The line number of the first statement within the block.
Essentially, the line number difference should be larger by one. Any number exceeding that points to the fact that there is one or more blank line(s).
After understanding the concept, the following is the complete source code, with comments, to achieve our goal.
package main
import (
"fmt"
"go/ast"
"go/parser"
"go/token"
)
func main() {
fpath := "./foo/foo.go"
fset := token.NewFileSet()
f, err := parser.ParseFile(fset, fpath, nil, parser.Mode(0))
if err != nil {
fmt.Println(err)
return
}
v := visitor{
fset: fset,
funcNames: make(map[*ast.BlockStmt]*ast.Ident),
}
ast.Walk(v, f)
}
type visitor struct {
// We need this field to save the fileset as a reference for line numbers.
fset *token.FileSet
// When a function is caught having an unnecessary line, the function name is retrieved here.
funcNames map[*ast.BlockStmt]*ast.Ident
}
func (v visitor) Visit(node ast.Node) ast.Visitor {
// nil node is skipped as it is irrelevant to our goal
if node == nil {
return nil
}
// Once we find a function, we save the function name in
// a map using its body statement as a pointer key.
if funcDecl, ok := node.(*ast.FuncDecl); ok {
blockStmt := funcDecl.Body
v.funcNames[blockStmt] = funcDecl.Name
}
if blockStmt, ok := node.(*ast.BlockStmt); ok {
// Find the line number of the beginning of a block statement.
stmtStartingPosition := blockStmt.Pos()
stmtLine := v.fset.Position(stmtStartingPosition).Line
// Find the line number of the first statement in the block.
firstStmt := blockStmt.List[0]
firstStmtStartingPosition := firstStmt.Pos()
firstStmtLine := v.fset.Position(firstStmtStartingPosition).Line
// The difference should be one. Newlines exist when it is larger.
if stmtLine+1 < firstStmtLine {
// Retrieve the function name with the pointer key we saved earlier,
// and print it.
funcName := v.funcNames[blockStmt]
fmt.Printf("Unnecessary newline at the beginning: %s\n", funcName)
}
}
return v
}
Running the code will print out the result:
> go run main.go
Unnecessary newline at the beginning: Foo
Looks great! You can try it out yourself on Go Playground.
Remember that this is an over-simplified solution. There are many scenarios that cannot be solved in this way. There is an existing library, whitespace, for solving the exact issue using a more structured approach. Check its source code out if you want to bring our tool to the next level.