Add S3 support to Spark/Spark-Shell

Out of the box, Spark does not come with S3 support. So running something like this in the spark-shell: scala> spark.read.parquet("s3a://my-bucket/my-data.parquet").printSchema will yield something like this: 2018-09-05 09:47:59 WARN FileStreamSink:66 - Error while looking for metadata directory. java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:705) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:389) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:389) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala. … »

gitlab-runner fails to run image

Gitlab-runner gave me the following error trying to run a job from one of my own images: ... Skipping Git submodules setup [: 1: [: Syntax error: end of file unexpected [: 1: [: Syntax error: end of file unexpected ERROR: Job failed: exit code 2 Turns out, the issue was in docker-entrypoint.sh, where exec $@ needs to be exec "$@" Not sure why exactly, but that fixed the issue. … »

Debugging Play Applications in IntelliJ

Debugging Play applications in IntelliJ can be facilitated by instrumenting remote debugging. Run your play application as usual and add -jvm-debug DEBUG_PORT to the sbt arguments: sbt -jvm-debug 9999 run Once your application is running, you can open a remote debugging connection in IntelliJ: Run -> Edit Configurations … Add New Configuration Choose “Remote” All settings should be fine in this dialog, except for the port which needs to be the port you specified earlier (DEBUG_PORT), 9999 in my case Give the configuration a meaningful name OK You can now run “Debug” and choose the configuration you created above At this point IntelliJ should state that the connection was successful, which means you can now go and set breakpoints and debug as usual. … »

Ansible 2.4: configure inventory file path on macOS

After upgrading ansible to 2.4 (from homebrew), my previous inventory file (/usr/local/etc/ansible/hosts) was no longer found: [WARNING]: Unable to parse /etc/ansible/hosts as an inventory source My solution was to configure the path to the inventory file in ~/.ansible.cfg: [defaults] inventory = /usr/local/etc/ansible/hosts Note that before ansible 2.4, the setting name for the inventory file used to be hostfile, which is now deprecated and you should use inventory instead. … »

Signing git commits with GPG on macOS for GitHub/GitLab

First, install GnuPG in order to create the keys for signing the git commits. Also install pinentry-mac which git/gpg requires later for displaying the passphrase dialog in order to decrypt your keys. brew install gnupg pinentry-mac Create a GPG key-pair: gpg --full-gen-key GPG will ask your for some more information. Use RSA/RSA (1) for the key kind, 4096 for the keysize, an expiration date for the keys, a name and email (this should be the same as the name and email you are using for your git commits). … »

Your PHP array indices getting messed up when unsetting values?

Thats because in PHP arrays are internally handled like hashes. Let me give you an example: <?php // Lets assumme we have an array of numbers $numbers = array(1, 2, 3, 4, 5); // Which we are going to inspect var_dump($numbers); // Output is /* array(4) { [0] => int(1) [1] => int(2) [2] => int(3) [3] => int(4) [4] => int(5) } */ // And we are going to unset the 3rd element unset($numbers[2]); // Lets inspect again var_dump($numbers); // Output is /* array(4) { [0] => int(1) [1] => int(2) [3] => int(4) [4] => int(5) } */ ? … »

Symfony 2 - Get the CSRF-Token of a form in a Twig template

You probably already know that you can get the value of each widget of a form using form.vars.value.{widget_name} in your Twig template (where {widget_name} is the name of the widget you are trying to get the value of). However, when trying to get the CSRF-Token this way, you will get an error that the variable _token does not exist. To access the CSRF-Token of a Symfony 2 form in Twig you have to use … »

A bulletproof pattern for creating Doctrine subqueries of any complexity

Doctrine subqueries can be very frustrating. They sometimes work but as soon as you reach a certain complexity level, Doctrine just can’t handle things anymore. I will show you how to write subqueries in Doctrine which you can nest in as many levels as you want without Doctrine complaining about it and still using the DQL (Doctrine Query Language). Notice that this post refers to Doctrine Version 1.2. Version 2 of Doctrine is already released but many developers still use symfony 1. … »

Browsers automatically submit single input field forms on enter (and how to fix that)

Normally browsers will not submit forms when the enter key is pressed. You have to implement this kind of feature via JavaScript (and you probably did already at some point). However if your form only has ONE single input field most browsers will submit the form when the enter key is pressed. You don’t even need a submit button for this to happen. This is deeply rooted in the HTML 2. … »

How to discard the query string in a RewriteRule (Apache, mod_rewrite)

Removing the query string in a rewrite rule of Apache’s module mod_rewrite is a bit tricky. Let’s say you want to redirect the url http://www.example.com?query=test to http://www.example.com/noqueries in Apache with mod_rewrite enabled. Rewriting the URL is no problem but Apache always appends the original query string (?query=test) to the resulting URL which we don’t want in this example. If you’re using Apache 2.4 or later you can use the QSD option (qsdiscard) to remove the query string like this: … »