Welcome to Julia
Julia is now installed across all compute servers of Wharton’s HPC systems. From the Julia Website:
Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library. Julia’s Base library, largely written in Julia itself, also integrates mature, best-of-breed open source C and Fortran libraries for linear algebra, random number generation, signal processing, and string processing. In addition, the Julia developer community is contributing a number of external packages through Julia’s built-in package manager at a rapid pace. IJulia, a collaboration between the Jupyter and Julia communities, provides a powerful browser-based graphical notebook interface to Julia.
To build a bit of documentation, I went through some of the excellent Quantitative Economics Julia lecture, augmented for our environment (HPCC), below. If you:
- want to run Julia code non-interactively, please go directly to Using Julia with Scripts (non-interactive Julia)
- want to run Julia interactively, but not in the GUI ‘notebook’, please go directly to Interactive Julia
Setting Up Your jupyter Notebook for Julia (GUI interactive Julia)
On hpcc.wharton.upenn.edu:
(login-server) $ setup-jupyter-notebook.sh Creating Wharton HPCC Jupyter notebook environment... PLEASE WAIT...
… the above can take a long time, DON’T INTERRUPT, output chopped for brevity …
(login-server) $ qlogin (compute-server) $ source /opt/rh/rh-python35/enable (compute-server) $ source ~/.virtualenvs/jupyter-notebook-py35/bin/activate (compute-server) $ julia _ _ _ _(_)_ | A fresh approach to technical computing (_) | (_) (_) | Documentation: https://docs.julialang.org _ _ _| |_ __ _ | Type "?help" for help. | | | | | | |/ _` | | | | |_| | | | (_| | | Version 0.6.0 (2017-06-19 13:05 UTC) _/ |\__'_|_|_|\__'_| | |__/ | x86_64-redhat-linux julia> Pkg.add("IJulia") INFO: Initializing package repository /home/wcit/hughmac/.julia/v0.4 ... takes quite a long time, output chopped for brevity ... julia> quit() (compute-server) $ exit (login-server) $ notebook-py35
… carefully follow the instructions presented …
NOTE: the ssh hpcc … port forward is in another Terminal window on OSX or MobaXterm window on Windows, on your local (desktop, laptop) system
Once your jupyter notebook webpage is up, you can do New (pulldown menu) > Julia
If you’re starting up on a new day, just:
- logon to HPCC
$ notebook-py35
- follow instructions
- In browser: New (pulldown menu) > Julia
So that’s the whirlwind GUI Notebook interactive setup, which should get you through “Setting up Your Julia Environment” in the Quantitative Economics Julia lecture.
Using Julia with Interactively from the Command Line
You can run Julia interactively from the command line. It’s simple! To start, from hpcc.wharton.upenn.edu:
(login-server) $ qlogin (compute-server) $ julia _ _ _ _(_)_ | A fresh approach to technical computing (_) | (_) (_) | Documentation: http://docs.julialang.org _ _ _| |_ __ _ | Type "?help" for help. | | | | | | |/ _` | | | | |_| | | | (_| | | Version 0.6.0 (2017-06-19 13:05 UTC) _/ |\__'_|_|_|\__'_| | |__/ | x86_64-redhat-linux julia>
Now that you’ve got Julia started, I recommend going through the comprehensive Quantitative Economics Julia lecture. You can skip downloading and installing …
Using Julia with Scripts (non-interactive Julia)
Take a look at the Julia demo files in /usr/local/demo/Julia
on the HPCC. It works similarly to other research software in the HPCC environment:
- create a Julia script file (.jl)
- create a job script file (.sh) that calls the Julia script file (.jl)
- submit the job script file with ‘
qsub
‘
Using Multiple Cores with Julia
You can use ‘-pe openmp #
‘ as an option to qsub (or ‘#$ -pe openmp #
‘ in your bash job script), where ‘#’ is the number of cores, up to 16 (number of cores on a box). Keep in mind that the more cores you request, the longer it may take for the job start, as that many cores must become available on the same box in the cluster. It might be ‘worth the wait’ for a longer running job, but likely not for short-running jobs.
Then you would read the environment variable ‘NSLOTS’ using some Julia code like:
#!/usr/bin/env julia slots = parse(Int, ENV["NSLOTS"]) addprocs(slots) nheads = @parallel (+) for i=1:20000000000 Int(rand(Bool)) end print(nheads)
Again, you would launch that with something like:
qsub -pe openmp 8 -N parjulia -j y -b y myparcode.jl
As a test, I ran that with 2, 4, and 8 cores, and got times of:
Cores | Time |
---|---|
2 | 40.889s |
4 | 21.828s |
8 | 13.980s |
So pretty good scaling. There’s a way to use multiple cores on multiple boxes, which would allow you to scale even larger (but at some performance hit for network latency), which I will document in a future post.
Multiple Tasks from One Job: Julia Array Jobs
Very often your best bet is to run many tasks in an array job. You get linear scaling, and the jobs don’t have to wait till all cores are available. That’s ‘-t 1-#
‘ option in qsub. That would launch ‘#’ of jobs, and each one would have the environment variable ‘SGE_TASK_ID’ (similar to the ‘NSLOTS’, above) set to the task number, which you can then use in Julia:
#!/usr/bin/env julia for x in ARGS println(x) end
The magic is the variables.txt file:
A B C 1 2 3 0.11 Acme IBM Gold Silver Bronzea
And the job script:
#!/bin/bash # filename: array_job.sh #$ -j y #$ -N Julia_Array_Job # run with 'qsub -t 1-$(wc -l <variables.txt) array_job.sh' VARS=$(sed -n ${SGE_TASK_ID}p variables.txt) echo "Launching Julia with VARS = $VARS" julia array_julia.jl $VARS
Launch with:
qsub -t 1-$(wc -l <variables.txt) array_job.sh
See the files in /usr/local/demo/Julia/array_job
.