GSoC 2019 - Final Evaluation

2019-08-26


Introduction

As we approach the beginning the Fall, my journey as a Google Summer of Code student for Haskell.Org sadly comes to an end. If you didn’t know already, this Summer I got the amazing opportunity to contribute to an awesome open-source project called issue-wanted for Google Summer of Code; a program in which Google pays students to help various communities build open-source software. I was chosen to work under Haskell.Org with two amazing mentors, Dmitrii Kovanikov and Veronika Romashkina. Before I even dive into what I accomplished for this project I would like to thank Haskell.Org, my mentors Dmitrii and Veronika, and the entire Haskell community for helping me contribute to issue-wanted. Now, let’s take a look at some of the challenges I had to overcome when working on this project and what I learned from it all.

The Beginning

Before being selected to work on issue-wanted, I had to write a proposal which detailed what I expected to accomplish and how I was going to do it. In writing this proposal, I researched a lot about the Haskell ecosystem and learned what would be the best libraries to use for this project. I got to use pretty much all the technologies I planned on using except for Miso. We had initially wanted the application to be a full stack Haskell application, but decided to use Elm on the frontend instead due to the technical difficulties of setting up GHCJS. We also passed on using GraphQL instead of REST due to the lack of a stable GraphQL library in Haskell, which I will talk more about later in this post. I got the opportunity to learn more about libraries like servant, postgresql-simple, mtl, and co-log. I also got the opportunity to learn about new libraries that I didn’t even expect to use like the hedgehog library, which was used write roundtrip property tests for our data types, and the unliftio library, which was used for our concurrency needs. All in all, I learned that the Haskell ecosystem has many different and very useful libraries to get the job done (at least for web development).

Challenges

While working on issue-wanted there were many expected and unexpected challenges that me and my mentors needed to overcome in order to get the project where it is now. Let’s start with the challenges I expected and how we got over them.

Expected Challenges

Right off the bat, I knew I was going to have some trouble establishing a good workflow for this project. Before working on issue-wanted I had never really used git and GitHub to work on a project with multiple contributors. I knew the very basics like how to make a commit, how to push, and how to pull, but I had trouble understanding concepts like using branches for adding individual features, how a merge and rebase are different, and how to use generated SSH keys for git. Luckily for me, there is tons of great documentation out there on git and GitHub, and my mentors had an amazing git tool called hit-on that allowed me to adjust to their tried and true workflow without much friction. I’m really glad they took the time to teach me these skills because these are going to be important for any job or project I work on in the future. I also learned a whole lot about CI (continuous integration), kanban, testing, and the idea of peer review. I have heard of these concepts before, but never got the chance to use them on a real life project until issue-wanted. Utilizing these ideas really does make you a better software engineer and helps you produce nice, maintainable code.

I also knew I was going to have some trouble learning some new Haskell concepts. Before starting work on issue-wanted the extent of my Haskell knowledge pretty much ended at a basic understanding of monads. I’ve used monad transformers once before in a project of mine with the help of a mentor, but I still didn’t really understand how I could use them practically. Like every other Haskeller in the world, I have read many tutorials[1] in an attempt to understand these mathematical structures, but nothing made sense until I actually used the theory in a real application like issue-wanted. I think I can now officially say that I have an understanding of monad transformers and how they can be used to create custom effects for your Haskell program. I even wrote about it in a blog post which you can find here!

On top of learning more about monads and monad transformers in Haskell, I also learned a lot more about other Haskell related concepts that I only had a vague idea of before, like language extensions[2], how to build and run a Haskell project with cabal/stack, hlint, GHC warnings, testing, and interestingly, how to format Haskell code. Before working on issue-wanted I’ve seen people format their Haskell code in various ways, and as far I know there really isn’t a standard on how to do so. Thankfully, Dmitrii and Veronika have created an amazing resource that I was able to reference during my work on issue-wanted called the Kowainik style-guide. I highly recommend you check it out if you are confused about how you should format your Haskell code.

After working on issue-wanted for these past 3 months, my interest in Haskell has grown even more. My goal is to one day write Haskell code for a living, so these skills that I learned are of great value to me and I hope I can continue to improve them.

Unexpected Challenges

As with any project, there were a few unexpected challenges that we ran into as well. I was naive because I thought I had researched everything pretty much perfectly while working on my proposal, so I was even more surprised than I should’ve been when I ran into these obstacles. The biggest thing I’ve taken away from encountering these challenges is that as a software engineer you just have to learn how to adapt when things don’t go your way.

As I mentioned earlier, we didn’t get the opportunity to use the Miso framework on the frontend as we had hoped. I’ve heard great things about Miso, but I just didn’t have enough experience with Nix and GHCJS to use it for this project. I was able to build an example Miso template, but I had no idea how to integrate it with our code base. Documentation on this topic wasn’t great either, so we decided to use Elm on the frontend instead. My mentors just setup the Elm frontend as I’m still finishing up the backend of the application. Even though I didn’t intend on working on the frontend of this project I now plan on learning some Elm and making some contributions there soon, but for now I’m mainly focused on finishing the Haskell backend.

We also decided not to use GitHub’s GraphQL API due to a lack of a stable Haskell GraphQL library, and used the GitHub REST API via the github library instead. This wasn’t a big deal, because the github library is pretty great and easy to use, but I was looking forward to using GraphQL for the first time. Amazingly, as GSoC comes to end there seems to be a nice GraphQL library in the works right now called morpheus-graphql. Maybe in the future we can refactor issue-wanted to use this library once it is complete, but for now the github library is working great for us except for one thing.

The biggest unexpected challenge that we ran into were the GitHub REST API rate limits. This was honestly just because I didn’t read the documentation close enough. It turns out that you can only fetch the first 1000 results, 10 pages with 100 results each, per query. This is problematic because there are ~90,000 Haskell repositories and ~200,000 issues! If I can only fetch 1000 results at a time, how can I get all the results? To work around this I had to implement a not-as-elegant-as-it-should-be algorithm to query the API by date intervals. Each query to the GitHub API has a different date interval, so I’m able to fetch 1000 results per date interval and aggregate them all into one list. This way we can fetch all the Haskell repositories and issues 1000 results or less at a time. I am still thinking of a way I can optimize this algorithm, but for now it does what we need it to do. Here’s what the algorithm looks like:

{-# LANGUAGE MultiWayIf #-}

githubSearchAll
    :: forall a env m.
       ( MonadUnliftIO m
       , WithError m
       , WithLog env m
       , FromJSON a
       , Typeable a
       )
    => Paths   -- ^ Query paths
    -> Text    -- ^ Query properties
    -> Day     -- ^ The function starts at this day and goes back in time by the size of the interval
    -> Integer -- ^ The date interval
    -> [a]     -- ^ A list of accumulated results used in recursive calls to this function
    -> m [a]
githubSearchAll paths properties recent interval acc = do
    -- | Search for first page of the query and check the result count to see what to do next.
    SearchResult count vec <- executeGithubSearch paths properties firstDay recent 1
    let firstPage = V.toList vec
        -- | If count is 0, then all results for the given properties have been obtained.
    if | count == 0 -> do
            log I "All results obtained"
            pure acc
        -- | If count is less than or equal to 1000, then the interval is good and we can get
        -- the rest of the results on pages 2 to 10.
       | count <= 1000 -> do
            listOfSearchResult <- mapM (executeGithubSearch paths properties firstDay recent) [2..10]
            let remainingPages = concatMap searchResultToList listOfSearchResult
            -- | Recursive call with a new @recent@ arguement and a new @acc@ arguement
            -- representing all pages accumulated up to this point.
            githubSearchAll paths properties (pred firstDay) interval (firstPage <> remainingPages <> acc)
        -- | Otherwise, call the function with the same arguments but a smaller interval.
       | otherwise -> githubSearchAll paths properties recent (pred interval) acc
  where
    -- | The first day of the search interval. It's calculated by subtracting the size of the
    -- interval from the most recent day of the search interval.
    firstDay :: Day
    firstDay = negate interval `addDays` recent

    searchResultToList :: SearchResult a -> [a]
    searchResultToList (SearchResult _ vec) = V.toList vec

What’s Next?

Even though I didn’t build issue-wanted to its full potential (yet!), I have accomplished my primary goal which was to create a solid foundation on which I and others can contribute to the project in the future. All of the code is commented nicely, we have great test coverage, CI setup for both the frontend and backend, and detailed issues on the project board which you can find here[3]. Future contributors should have no problems contributing to issue-wanted. I am almost done with the core functionality of the backend. I need to implement endpoints so the frontend can request Haskell project categories and issue labels. I also need to implement an algorithm for cleaning the database of closed issues. We also still need to figure out how to deploy the application on the web. My mentors have setup a basic Elm frontend for the application, but we’re not designers so we’re looking for someone that can design a nice looking, intuitive UI for the application. Let us know if you’re interested! On top of this there are many features that I plan on adding to make the application better as time goes on. This includes adding GitHub authentication, tracking GitHub milestones, and possibly migrating the REST API to GraphQL.

Conclusion

I had so much fun working on this project and I learned so much! My mentors were super awesome and I look forward to continuing to work with them. I can’t wait to finally complete this project and deploy it so community can use it. For anyone that would like to contribute, you can check out the code here. I believe it will be very useful for newcomers to Haskell. I would like to thank Haskell.Org for giving me the opportunity to work on this project and the Haskell community for giving me recommendations on what libraries to use and other useful tips. Issue-wanted is truly by the community, for the community. Thank you all!

Footnotes:

[1] Monad Transformers Step By Step is the best paper I’ve read on monad transformers. This paper, along with using monad transformers in practice, really helped me realize how and why monad transformers are useful.

[2] Alexis King has a great post about Haskell languge extensions and other useful Haskell tips. It was written in 2018 but it is still relevant and I highly recommend new Haskellers read it.

[3] For anyone at Google or Haskell.Org that needs to verify the work I’ve contributed to the project, check the Done column of the project board.