Showing posts with label hack. Show all posts
Showing posts with label hack. Show all posts

7 Nov 2017

Prewarming / Initializing an RDS Postgres instance (from S3)

UPDATE: Read this for recent updates. Now the SQL successfully fetches *all* disk blocks on most RDS PostgreSQL (read post for the rare exceptions).


As many of you know, that AWS RDS Postgres uses EBS which has an interesting feature called Lazy Loading that allows it to instantiate a disk (the size of which can be mostly anything from 10GB to 6TB) and it comes online within a matter of minutes. Although a fantastic feature, this however, can lead to unexpected outcomes when high-end production load is thrown at a newly launched RDS Postgres instance immediately after Restoring from a Snapshot.

One possible solution is to use the pg_prewarm Postgres Extension that is well supported in RDS Postgres, immediately after Restoring from a Snapshot, thereby reducing the side-effects of Lazy Loading.

Although pg_prewarm was originally meant for populating buffer-cache, this extension (in this specific use-case) is heaven-sent to initialize (fetch), (almost) the entire snapshot from S3 on to the RDS EBS volume in question. Therefore, even if you use pg_prewarm to run through all tables etc., thereby effectively evicting the recent run for the previous table from buffer-cache, it still does the job of initializing all disk-blocks with respect to the EBS volume.

I've just checked in the SQL to this repository that seems to do this magic pretty well. It also enlists why this would only take you ~70% of the way owing to restrictions / limitations (as per my current understanding).

In the Sample below, I restored a new RDS Postgres instance from a Snapshot and immediately thereafter ran this SQL on it.


  • Notice that the first table (pgbench_accounts) takes about 22 seconds to load the first time, and less than a second to load the second time.
  • Similarly the second table (pgbench_history) takes 15 seconds to load the first time and less than a second, the second time :) !



pgbench=>       SELECT clock_timestamp(), pg_prewarm(c.oid::regclass),
pgbench->       relkind, c.relname
pgbench->       FROM pg_class c
pgbench->         JOIN pg_namespace n
pgbench->           ON n.oid = c.relnamespace
pgbench->         JOIN pg_user u
pgbench->           ON u.usesysid = c.relowner
pgbench->       WHERE u.usename NOT IN ('rdsadmin', 'rdsrepladmin', ' pg_signal_backend', 'rds_superuser', 'rds_replication')
pgbench->       ORDER BY c.relpages DESC;
        clock_timestamp        | pg_prewarm | relkind |        relname
-------------------------------+------------+---------+-----------------------
 2017-11-07 11:41:44.341724+00 |      17903 | r       | pgbench_accounts
 2017-11-07 11:42:06.059177+00 |       6518 | r       | pgbench_history
 2017-11-07 11:42:17.126768+00 |       2745 | i       | pgbench_accounts_pkey
 2017-11-07 11:42:21.406054+00 |         45 | r       | pgbench_tellers
 2017-11-07 11:42:21.645859+00 |         24 | r       | pgbench_branches
 2017-11-07 11:42:21.757086+00 |          2 | i       | pgbench_branches_pkey
 2017-11-07 11:42:21.757653+00 |          2 | i       | pgbench_tellers_pkey
(7 rows)

pgbench=>
pgbench=>       SELECT clock_timestamp(), pg_prewarm(c.oid::regclass),
pgbench->       relkind, c.relname
pgbench->       FROM pg_class c
pgbench->         JOIN pg_namespace n
pgbench->           ON n.oid = c.relnamespace
pgbench->         JOIN pg_user u
pgbench->           ON u.usesysid = c.relowner
pgbench->       WHERE u.usename NOT IN ('rdsadmin', 'rdsrepladmin', ' pg_signal_backend', 'rds_superuser', 'rds_replication')
pgbench->       ORDER BY c.relpages DESC;
        clock_timestamp        | pg_prewarm | relkind |        relname
-------------------------------+------------+---------+-----------------------
 2017-11-07 11:42:33.914195+00 |      17903 | r       | pgbench_accounts
 2017-11-07 11:42:33.917725+00 |       6518 | r       | pgbench_history
 2017-11-07 11:42:33.918919+00 |       2745 | i       | pgbench_accounts_pkey
 2017-11-07 11:42:33.919412+00 |         45 | r       | pgbench_tellers
 2017-11-07 11:42:33.919427+00 |         24 | r       | pgbench_branches
 2017-11-07 11:42:33.919438+00 |          2 | i       | pgbench_branches_pkey
 2017-11-07 11:42:33.919443+00 |          2 | i       | pgbench_tellers_pkey
(7 rows)


29 Sept 2017

PsqlForks now recognizes PgBouncer

With this commit, PsqlForks knows when it's talking to PgBouncer (and not Postgres).

Down the line, this should pave way for PsqlForks to more cleanly convey why (most of) the given psql shortcut(s) don't work (and what else does).

As always, the psql/README always has the most updated status of any engine support.

$ psql -h localhost -E -p6543 -U postgres pgbouncer
psql (client-version:11devel, server-version:1.7.1/bouncer, engine:pgbouncer)
Type "help" for help.

pgbouncer=# show version;
NOTICE:  pgbouncer version 1.7.1
SHOW
pgbouncer=#

25 Sept 2017

PsqlForks now supports CockroachDB

PsqlForks now supports CockroachDB as much as is currently possible. You can check it's current SQL status here.

$ /opt/postgres/master/bin/psql -h localhost -E -p 26257 -U root
psql (client-version:11devel, server-version:9.5.0, engine:cockroachdb)
Type "help" for help.

root=> select version();
                                version()
--------------------------------------------------------------------------
 CockroachDB CCL v1.0.6 (linux amd64, built 2017/09/14 15:15:48, go1.8.3)
(1 row)
bank=> \l
                                      List of databases
        Name        | Owner |     Encoding      |  Collate   |   Ctype    | Access privileges
--------------------+-------+-------------------+------------+------------+-------------------
 bank               |       | Not Supported Yet | en_US.utf8 | en_US.utf8 | Not Supported Yet
 crdb_internal      |       | Not Supported Yet | en_US.utf8 | en_US.utf8 | Not Supported Yet
 information_schema |       | Not Supported Yet | en_US.utf8 | en_US.utf8 | Not Supported Yet
 pg_catalog         |       | Not Supported Yet | en_US.utf8 | en_US.utf8 | Not Supported Yet
 system             |       | Not Supported Yet | en_US.utf8 | en_US.utf8 | Not Supported Yet
(5 rows)
bank=> \dv
      List of relations
 Schema | Name | Type | Owner
--------+------+------+-------
 bank   | a    | view |
(1 row)

bank=> \di
                       List of relations
 Schema |          Name           | Type  | Owner |   Table
--------+-------------------------+-------+-------+------------
 bank   | primary                 | index |       | accounts
 system | jobs_status_created_idx | index |       | jobs
 system | primary                 | index |       | descriptor
 system | primary                 | index |       | eventlog
 system | primary                 | index |       | jobs
 system | primary                 | index |       | lease
 system | primary                 | index |       | namespace
 system | primary                 | index |       | rangelog
 system | primary                 | index |       | settings
 system | primary                 | index |       | ui
 system | primary                 | index |       | users
 system | primary                 | index |       | zones
(12 rows)

15 Sept 2017

PsqlForks now supports PipelineDB

After working on this PSQL variant that intends to support all Postgres forks, I finally narrowed down to naming it.

Since this was essentially Psql (for) Forks, quite intuitively, I chose to name it PsqlForks.

Considering that until recently this fork just supported Amazon Redshift, this naming didn't make much sense if it wasn't supporting at least 2 forks :) !

Thus, PsqlForks now supports PipelineDB!


$  /opt/postgres/master/bin/psql -U pipeline -p 5434 -h localhost pipeline
psql (client-version:11devel, server-version:9.5.3, engine:pipelinedb)
Type "help" for help.

pipeline=# \q

2 Sept 2017

psql \d now supports Interleaved / Compound SORTKEYs (in Redshift)

In continuation of support for Redshift series, now Describe Table (for e.g. \d tbl) shows SORTKEY details. This resolves Issue #6 and shows both COMPOUND / INTERLEAVED variations along with all the column names.

This change was complicated because Redshift doesn't natively support LISTAGG() function on System / Catalog tables, which meant that I had to resort to a pretty verbose workaround. This in-turn meant that this patch shows only the first ten COMPOUND SORTKEYs of a table. Seriously speaking, it would really take an extreme corner-case, for someone to genuinely require a SORTKEY with 10+ columns.

This is not a limitation for INTERLEAVED SORTKEY since it only supports a maximum of 8 Columns.


db=# CREATE TABLE tbl_pk(custkey SMALLINT PRIMARY KEY);
CREATE TABLE
db=# \d tbl_pk
                                           Table "public.tbl_pk"
 Column  |   Type   | Encoding | DistKey | SortKey | Preload | Encryption | Collation | Nullable | Default
---------+----------+----------+---------+---------+---------+------------+-----------+----------+---------
 custkey | smallint | lzo      | f       | 0       | f       | none       |           | not null |
Indexes:
 PRIMARY KEY, btree (custkey)

db=# CREATE TABLE tbl_compound(
db(#   custkey   SMALLINT                ENCODE delta NOT NULL,
db(#   custname  INTEGER DEFAULT 10      ENCODE raw NULL,
db(#   gender    BOOLEAN                 ENCODE RAW,
db(#   address   CHAR(5)                 ENCODE LZO,
db(#   city      BIGINT identity(0, 1)   ENCODE DELTA,
db(#   state     DOUBLE PRECISION        ENCODE Runlength,
db(#   zipcode   REAL,
db(#   tempdel1  DECIMAL                 ENCODE Mostly16,
db(#   tempdel2  BIGINT                  ENCODE Mostly32,
db(#   tempdel3  DATE                    ENCODE DELTA32k,
db(#   tempdel4  TIMESTAMP               ENCODE Runlength,
db(#   tempdel5  TIMESTAMPTZ             ENCODE DELTA,
db(#   tempdel6  VARCHAR(MAX)            ENCODE text32k,
db(#   start_date VARCHAR(10)            ENCODE TEXT255
db(# )
db-# DISTSTYLE KEY
db-# DISTKEY (custname)
db-# COMPOUND SORTKEY (custkey, custname, gender, address, city, state, zipcode, tempdel1, tempdel2, tempdel3, tempdel4, tempdel5, start_date);
CREATE TABLE
db=#
db=# \d tbl_compound
                                                                 Table "public.tbl_compound"
   Column   |            Type             | Encoding  | DistKey | SortKey | Preload | Encryption | Collation | Nullable |              Default
------------+-----------------------------+-----------+---------+---------+---------+------------+-----------+----------+------------------------------------
 custkey    | smallint                    | delta     | f       | 1       | f       | none       |           | not null |
 custname   | integer                     | none      | t       | 2       | f       | none       |           |          | 10
 gender     | boolean                     | none      | f       | 3       | f       | none       |           |          |
 address    | character(5)                | lzo       | f       | 4       | f       | none       |           |          |
 city       | bigint                      | delta     | f       | 5       | f       | none       |           |          | "identity"(494055, 4, '0,1'::text)
 state      | double precision            | runlength | f       | 6       | f       | none       |           |          |
 zipcode    | real                        | none      | f       | 7       | f       | none       |           |          |
 tempdel1   | numeric(18,0)               | mostly16  | f       | 8       | f       | none       |           |          |
 tempdel2   | bigint                      | mostly32  | f       | 9       | f       | none       |           |          |
 tempdel3   | date                        | delta32k  | f       | 10      | f       | none       |           |          |
 tempdel4   | timestamp without time zone | runlength | f       | 11      | f       | none       |           |          |
 tempdel5   | timestamp with time zone    | delta     | f       | 12      | f       | none       |           |          |
 tempdel6   | character varying(65535)    | text32k   | f       | 0       | f       | none       |           |          |
 start_date | character varying(10)       | text255   | f       | 13      | f       | none       |           |          |
Indexes:
 COMPOUND SORTKEY (address,tempdel2,start_date,custkey,zipcode,tempdel4,city,state,tempdel3,custname)

db=# CREATE TABLE tbl_interleaved(custkey SMALLINT) INTERLEAVED SORTKEY (custkey);
CREATE TABLE
db=# \d tbl_interleaved
                                      Table "public.tbl_interleaved"
 Column  |   Type   | Encoding | DistKey | SortKey | Preload | Encryption | Collation | Nullable | Default
---------+----------+----------+---------+---------+---------+------------+-----------+----------+---------
 custkey | smallint | none     | f       | 1       | f       | none       |           |          |
Indexes:
 INTERLEAVED SORTKEY (custkey)

As a side-note, there is a consideration as to whether this should be on a separate section of its own (and not under Indexes, which it clearly isn't). May be another day. Happy Redshifting :) !

Update (15th Sep 2017):
This project has now been named PsqlForks!

31 Aug 2017

psql \d now supports DISTKEY / SORTKEY / ENCODING (in Redshift)

This is in continuation of my work for (my forked version of) psql to better support Redshift (read more here).

Now \d table provides some additional Redshift specific table properties such as:
  • DISTKEY
  • SORTKEY
  • COMPRESSION (ENCODING)
  • ENCRYPTION
Sample:

t3=# CREATE TABLE customer(
  custkey   SMALLINT                ENCODE delta NOT NULL,
  custname  INTEGER DEFAULT 10      ENCODE raw NULL,
  gender    BOOLEAN                 ENCODE RAW,
  address   CHAR(5)                 ENCODE LZO,
  city      BIGINT identity(0, 1)   ENCODE DELTA,
  state     DOUBLE PRECISION        ENCODE Runlength,
  zipcode   REAL,
  tempdel1  DECIMAL                 ENCODE Mostly16,
  tempdel2  BIGINT                  ENCODE Mostly32,
  tempdel3  DATE                    ENCODE DELTA32k,
  tempdel4  TIMESTAMP               ENCODE Runlength,
  tempdel5  TIMESTAMPTZ             ENCODE DELTA,
  tempdel6  VARCHAR(MAX)            ENCODE text32k,
  start_date VARCHAR(10)            ENCODE TEXT255
)
DISTSTYLE KEY
DISTKEY (custname)
INTERLEAVED SORTKEY (custkey, custname);
CREATE TABLE
t3=# \d customer
                                                                   TABLE "public.customer"
   Column   |            Type             | Encoding  | DistKey | SortKey | Preload | Encryption | Collation | Nullable |              Default
------------+-----------------------------+-----------+---------+---------+---------+------------+-----------+----------+------------------------------------
 custkey    | smallint                    | delta     | f       | 1       | f       | none       |           | not null |
 custname   | integer                     | none      | t       | 2       | f       | none       |           |          | 10
 gender     | boolean                     | none      | f       | 0       | f       | none       |           |          |
 address    | character(5)                | lzo       | f       | 0       | f       | none       |           |          |
 city       | bigint                      | delta     | f       | 0       | f       | none       |           |          | "identity"(493983, 4, '0,1'::text)
 state      | double precision            | runlength | f       | 0       | f       | none       |           |          |
 zipcode    | real                        | none      | f       | 0       | f       | none       |           |          |
 tempdel1   | numeric(18,0)               | mostly16  | f       | 0       | f       | none       |           |          |
 tempdel2   | bigint                      | mostly32  | f       | 0       | f       | none       |           |          |
 tempdel3   | date                        | delta32k  | f       | 0       | f       | none       |           |          |
 tempdel4   | timestamp without time zone | runlength | f       | 0       | f       | none       |           |          |
 tempdel5   | timestamp with time zone    | delta     | f       | 0       | f       | none       |           |          |
 tempdel6   | character varying(65535)    | text32k   | f       | 0       | f       | none       |           |          |
 start_date | character varying(10)       | text255   | f       | 0       | f       | none       |           |          |

Now that a few 'ToDos' are listed on Github Issues, next would probably involve working on this ticket, which aims at elaborate SORTKEY details (such as INTERLEAVED / COMPOUND) etc. when using Describe Table.

Update (15th Sep 2017):
This project has now been named PsqlForks!

12 Aug 2017

Redshift support for psql

Am sure you know that psql doesn't go out of it's way to support Postgres' forks natively. I obviously understand the reasoning, which allowed me to find a gap that I could fill here.

The existing features (in psql) that work with any Postgres fork (like Redshift) are entirely because it is a fork of Postgres. Since I use psql heavily at work, last week I decided to begin maintaining a Postgres fork that better supports (Postgres forks, but initially) Redshift. As always, unless explicitly mentioned, this is entirely an unofficial effort.

The 'redshift' branch of this Postgres code-base, is aimed at supporting Redshift in many ways:
  • Support Redshift related artifacts
    • Redshift specific SQL Commands / variations
    • Redshift Libraries
  • Support AWS specific artifacts
  • Support Redshift specific changes
    • For e.g. "/d table" etc.

The idea is:
  • Maintain this branch for the long-term
    • At least as long as I have an accessible Redshift cluster
  • Down the line look at whether other Postgres forks (for e.g. RDS Postgres) need such special attention
    • Although nothing much stands out yet
      • Except some rare exceptions like this or this, which do need to go through an arduous long wait / process of refinement.
  • Change the default port to 5439 (or whatever the flavour supports)
    • ...with an evil grin ;)
  • Additionally, as far as possible:
    • Keep submitting Postgres related patches back to Postgres master
    • Keep this branch up to date with Postgres master

Update (31st August 2017)
  • Currently this branch supports most Redshift specific SQL commands such as
    • CREATE LIBRARY
    • CREATE TABLE (DISTKEY / DISTSTYLE / ...)
    • Returns non-SQL items like
      • ENCODINGs (a.k.a. Compressions like ZSTD / LZO etc )
      • REGIONs (for e.g. US-EAST-1 etc.)
  • Of course some complex variants (for e.g. GRANT SELECT, UPDATE ON ALL TABLES IN SCHEMA TO GROUP xxx ) don't automatically come up with tab-complete feature. This is primarily because psql's tab-complete feature isn't very powerful to cater to all such scenarios which in turn is because psql's auto-complete isn't a full-fledged parser to begin with.
  • In a nutshell, this branch is now in a pretty good shape to auto-complete the most common Redshift specific SQL Syntax.
  • The best part is that this still merges perfectly with Postgres mainline!

    Let me know if you find anything that needs inclusion, or if I missed something.

    ====================================

    $  psql -U redshift_user -h localhost -E -p 5439 db
    psql (client-version:11devel, server-version:8.0.2, engine:redshift)
    Type "help" for help.

    db=#

    3 Aug 2017

    Reducing Wires

    Recently got an additional monitor for my workstation@home and found that the following wires were indispensable:

    • USB Mouse
    • Monitor VGA / HDMI / DVI cable
    • USB Hub cable (Pen Drive etc.)
    I was lucky that this ($20 + used) Dell monitor was an awesome buy since it came with a Monitor USB Hub (besides other goodies such as vertical rotate etc).

    After a bit of rejigging, this is how things finally panned-out:
    • 1 USB Wire (from the laptop) for the MUH (Monitor USB Hub)
      • This is usually something like this.
    • Use a USB->DVI converter and use that to connect MUH -> Monitor DVI port
      • This is usually something like this.
    • Plug USB Mouse to MUH
    • With things working so well, I also plugged a Wireless Touchpad dongle to the MUH
    So now when I need to do some office work, connecting 1 USB wire gets me up and running!

    #LoveOneWires :)

    Now only if I could find a stable / foolproof Wireless solution here ;)

    31 May 2017

    Patch: Using --no-comments with pg_dump

    Recently I submitted a patch for review that allows a non-superuser to practically use a backup taken from pg_dump.

    Currently it is a kludge (and well known at that - Ref 1 / Ref 2 / Ref 3 / Ref 4) but since it's (the ideal solution) too big a byte to chew and not in high-demand, it has seen little to no traction in the past decade.

    This patch should allow the above. But more importantly, it should also allow regular users of AWS RDS Postgres as well as Google Cloud Postgres databases (who do not get SuperUser access by-design) to reliably use the backups, instead of tinkering with the backup SQLs and remove things like COMMENT ON EXTENSION for it to even run during restoration.

    The bad news is that since Postgres 10 has already branched off, I doubt this would see the light of the day (if at all) any time before Postgres 11 (unless there is consensus that it's helpful enough and gets 'back-patched' to Postgres 10 to be released around September 2017).

    Update (3rd Oct 2017):
    This is now a part of my PsqlForks branch. You can check the related commit here.

    Update (26th Jan 2018):
    This is now part of the official Postgres v11 branch. You can check the related commit here.

    18 Dec 2015

    Getting my hands on a Google OnHub

    So finally, I get to lay my hands on the new Google OnHub at home.

    Unlike its other Google cousins, the OnHub isn't yet available for sale in India and then its a niche product (yet) here in India. The other obvious question is whether you'd pay 6x the price for a router, even when a brand like Google sports it.

    I think I would and so was finally able to put my foot down when I realised that I had had enough of other routers making things difficult at home.

    A few features I liked:
    • Automatic Software update
      • I loved this aspect that is pretty much missing in all its contemporaries
        • Considering that my 10 year old D-Link 502T hasn't received a single firmware / OS update I was scared to death what all crapware was running on my Home WiFi a month back.
        • That coupled with a few Einsteinish Router companies forced to admit secret (idiotically planned) backdoors, it just isn't funny to realise that my Home Router was probably a 'piece of cake' for a script kiddie trying to login.
    • Prioritize a phone
      • Again its a pleasure showing it to my wife how easy it is for her to prioritize her phone, when the kids are watching YouTube in HQ.
    • Router configuration a breeze
      • Its super simple to manage
        • I just recently got a Chromebook for Audio replacement working at home, and it was pleasant to realise that setting static IP address wasn't about setting /etc/network/interfaces anymore. The PI2 stayed on DHCP and I set the OnHub to give the MusicBox a static IP hereon... QED :) !
    • Manage your OnHub from China!
      • Once configured, you could manage your router sitting hundreds of miles away!
        • Which basically means no long calls to your GrandMa asking her to read out what is on the screen when she types 'http://192.168.1.1' on the browser.
        • You could be managing multiple OnHubs on your phone, each sitting at your parents place hundreds of miles away (without VNC / TeamViewer / RDP hacks) and still configure every minute detail such as setting Port-Forwarding / DNS / DHCP etc. from your SmartPhone.
    • WiFi connection optimization
      • Frankly, with so many walls, some remote corners of my house have seen some network quality degradation at times, but I haven't seen a 'No Network' message yet. So probably its doing a good job there, but I am sure I can't tell that right away.
    Add to that, if we consider that this machine is a dual core machine (with a GPU) most of which isn't even put to use (yet), I am pretty excited to know what its real potential is and how Google upgrades my 'boring router' down the line.

    Rumour is that this might just be a Google's shot at an Echo or a Siri sitting in your drawing room. But till that happens, I'd have to stay pleased with a beautiful router sitting on the desk :)

    Now you may want to get paranoid and all and worry about how Google could keep an eye on Dr. Lanning (you), but I have a feeling that it'd take a while before I give breadcrumbs to a Detective Spooner.

    All in all, a (pretty) costly router upgrade but I ain't regretting it.



    17 Dec 2015

    MusicBox + Pi1B + Ancient 4.1 Speakers => Chromecast for Audio

    Since Chromecast-for-Audio hasn't yet been launched in India, eBay sellers are using near monopoly extra for such a device here. Frustrated with being blackmailed like this, I resorted to conjuring up a combo that finally gave some shape to my 2 year old Pi and a ten year old 4.1 music system a life, and gave a good groove to my (otherwise boring) drawing room :) !
    to charge a painful 150%

    Parts:
    • Any Raspberry Pi
      • Except PiZero
        • Too low powered, would not work
      • To clarify, either of the following combos would work
        • The old PiA+ / PiB+
          • with SD Card
            • 2GB or more
        • The newer Pi2
          • with MicroSD Card
            • 2GB or more
    • MusicBox
    • Win32DiskImager
    • Client Operating Sytem
      • (Desktop) Windows
        • Bonjour
          • Optional
            • Makes the URL easier
          • I had Apple iTunes that install Bonjour support by default
      • (Smartphone) Android
        • Doesn't support Bonjour out of the box
        • Alternative,
          • Get the Pi to come with constant IP address
          • Bookmark the URL instead
    • Volume Control
      • The Web Interface allows changing volume
      • From the command line you could try this
        • amixer cset numid=1 -- 80%
    Happy Listening :) !

    20 Nov 2015

    Reverse Port Forwarding and how !

    Recently had to work on a PoC that required spinning off Docker instances with custom configurations. The problem wasn't as much with the Docker aspect, which is probably up for another story sometime, as much as the fact that as luck had it, we couldn't commission a separate machine for this PoC, and I eventually had to do the entire development inside a 2G VirtualBox VM on my 8G Laptop.

    The tricky part was that we had to submit the URL to the PoC Landing Page even before it could be ported to a full-fledged VM/server.

    So effectively, we:
    • Submitted a URL (on ServerA) running Apache, but no space to host a 40G VM.
    • Had a Laptop where the VM was developed, but can't stay online since its not always connected
    • Had another server (ServerB) where the VM needs to be ported, to be always up for PoC evaluation
    • ServerA was in the US, whereas, ServerB (& Laptop) are half-way across the world in India.
    Separately, the VM was an Ubuntu installation that had the following:
    • A running NodeJS (Bootstrap / Express combo) serving the PoC GUI
    • Separately it hosted the docker runtime that spun off Docker instances that had their own propietary application + Tomcat running on port 8080, all of which eventually supposed to be visible company-wide
    • Thirdly, it had a PostgreSQL 9.3 database that served as a backend to all the running docker-based app instances

    To connect ServerA with the PoC hosted on ServerB we tried the following:
    • Creating VirtualHost entries on the ServerA that pointed to corresponding ports on the VM
      • This failed because the VM was running inside a NATted configuration and thus was invisible to ServerA
    • Then we tried an alternate solution, wherein Apache->VirtualHost entries on ServerA were pointed to port 80 on ServerB. This required port-forwarding port 80 on ServerB to port 3000 (NodeJS) on ServerB. Although for this we used NetSH, this failed for various reasons, some of which we could identify and some we couldn't. For e.g. we installed the ipv4 module for NetSH, also tried installing ipv6, as well as, enabling Windows Firewall for NetSH to work (contrary to popular belief that Windows Firewall might be the reason for things getting blocked)
    • We finally settled down to running persistent SSH connections directly from the VM to ServerA (thereby by-passing all Windows Firewall issues) and setting up Remote Port-Forward such that remote ports could connect to Node Server inside the VM
      • This too initially didn't work as expected. To identify the cause we:
        • isabled SELinux on ServerB, then ServerA (although that didn't make sense) as well as on the VM to no avail
        • Then we disabled IPTables on all the relevant machines and that still didn't help
        • Finally we realised that the Remote Port-Forwarding though working well, was getting attached to the localhost interface on ServerA. Which means that although 127.0.0.1:80 on ServerA was working as expected, 192.168.1.1:80 on the same server immediately gave a "connection failed". It took a while to realise that though I was logging in trans-atlantic, the immediate 'Connection Failed' was an issue with ServerA and not with this side of the line.
        • What complicated things post that was that the regular -g option was insufficient. It required setting the GatewayPorts on the SSHD (with a sshd restart) to work as expected.

    Eventually, after a few hours of transatlantic jugglery, finally got a working URL that looked something like this... What a day !



    12 Nov 2015

    Pigz - Parallel GZip

    For a while, I've been frustrated with the fact that GZip was unable to consume idle CPUs on my laptop. Recently read about PIGZ (a drop-in GPL replacement) that implicitly consumes all CPUs.

    Pronounced Pig-Zee, PIGZ is a drop-in replacement for GZIP and automatically consumes all CPUs of the host machine in order to get the compression completed much faster.

    OLD: tar -cf - sourcefolder | gzip > sourcefolder.tar.gz

    NEW: tar -cf - sourcefolder | pigz > sourcefolder.tar.gz

    Enjoy!

    11 Jun 2015

    Using Pi as an home-based Media Server

    This is among a series of articles on my experience with the Pi.

    This article is about using a Pi as a primitive low-end File-Server for your home network:

    Expectations:

    • Torrent-Server
      • Download Torrents
      • Store on a File-Share
    • Windows Share
      • Serve the File-Share as a Windows Share Drive
      • Allow Read / Write to this Windows Share Drive
    • Use File-Share as Media-Server
      • Using any Smartphone / Laptop
        • Play Movies using VLC
        • View all Photos / Home-Videos


    Effectively:
    1. Always on
      1. Always accepting new Torrent requests
      2. Instantly start downloading 
    2. In Real-time
      1. Allow user's to view torrent download status
    3. Use any UPnP Phone App to play Video content over WiFi on a SmartPhone
    4. Use VLC (Network Streaming) to play any Video over WiFi on Laptop / Desktop
    5. Use Windows Share Drive and view all Photos / Home-Videos as needed

    How-To:
    • On Server
      • Install Torrent-Daemon
      • Configure Torrent-Daemon to listen on RPC requests
        • Here's the howto for that
      • Configure Samba
        • Set the Download folder to be shared
    • On Windows
      • Install Transmission-GUI-Remote
      • Configure GUI to use the RPC based Torrent-Daemon server
      • Make this application the default for .torrent & magnet files
    • On Linux
      • Install Transmission-Remote
        • sudo apt-get install transmission-remote
      • Configure that to use Daemon (instead of downloading directly with transmission-cli)

    Pros/Cons:
    • Pros
      • Once torrent download has begun, client can disconnect / shutdown client computer
      • Server continues to get torrents, after a restart
      • miniDLNA serving speed pretty decent
        • Watching a movie (over WiFi) on VLC
          • CPU ratio barely 0.01 which is pretty decent
            • Should be able to easily serve a small army :)
              • If no other IO is happening
              • If WiFi isn't the bottleneck 
    • Cons
      • Storage on Pi needs to be managed from time-to-time
        • Currently is highly adhoc based
          • Truly taking RAID concept to heart!!
            • Have 6 Pen Drives (ranging from 4GB to 16GB)
              • 8GBs are INR ~170 ( $3 )
              • All connected via 3 USB Hubs (both USB 2.0 and USB 3.0)
                • One Powered, two non-powered hubs
              • Most on Btrfs
                • Pretty stable, surprising that still get 1+ MBps with Pi's CPU
              • Some VFAT fs since have need of moving stuff off of Pi to a Windows Laptop
              • All mounted under /disk
                • Very temporary arrangement
                  • Ideally am looking for a LVM solution (which allows me to remove / add pen-drives on the fly and thus can be called as one) that actually is suited for this purpose.
              • But miniDLNA picks up Photo / Audio / Video pretty well from all different mounted folders
      • Download speed limitations
        • Internet Speed = 2 MBps
        • Daemon Download speed capping = 1Mbps
          • But PI never reaches it :D !
            • Just imagine the poor configuration
              • Pi + Btrfs + USB 2.0 + 3 USB Chain + Unpowered Hub
                • Still consistent at 450kbps! Impressive!
      • Lack of Transmission-Daemon configuration tool means all configuration has to be done via black-screen configuration files
    • Careful configuration
      • Give all access to 'all' users only if you're sure that all users are going to be careful

    Display IMDb Ratings on Einthusan

    Technical Features ...