Posted in Uncategorized | Leave a comment

Casandra LInks

cloud times web site why nosql ?

Big Data – Data Never Sleeps

The Casandra paper

The Casandra RefCard

Casandra  roots traced back to facebook

Casandra api example on dumping data from it

Few words on unstructured database blog

Cassandra winning the NoSql race 2009

Cassandra quick tour

Cassandra getting started

WTF is a supercolumn and more

Should the terminology change

Replication in Cassandra and HBase blog

client choices in Cassandra clientdrivers

client cassandra  api

spring data Casandra api

GUI for Cassandra

Compare of Cassandra

Clones of Big Table on Cassandra

  • Written in: Java
  • Main point: Best of BigTable and Dynamo
  • License: Apache
  • Protocol: Thrift & custom binary CQL3
  • Tunable trade-offs for distribution and replication (N, R, W)
  • Querying by column, range of keys (Requires indices on anything that you want to search on)
  • BigTable-like features: columns, column families
  • Can be used as a distributed hash-table, with an “SQL-like” language, CQL (but no JOIN!)
  • Data can have expiration (set on INSERT)
  • Writes can be much faster than reads (when reads are disk-bound)
  • Map/reduce possible with Apache Hadoop
  • All nodes are similar, as opposed to Hadoop/HBase
  • Very good and reliable cross-datacenter replication

Best used: When you write more than you read (logging). If every component of the system must be in Java. (“No one gets fired for choosing Apache’s stuff.”)

For example: Banking, financial industry (though not necessarily for financial transactions, but these industries are much bigger than that.) Writes are faster than reads, so one natural niche is data analysis.

Cassandra vs HBase NoSql battle

Cassandra at twitter

why they moved to Cassandra blog

blog big data overview slides

cassandra myths infoq

cassandra maven

how do i cassandra slides

cassandra under hood

Posted in Uncategorized | Leave a comment

Big Data annd No SQL links

Hadoop Intro By HortonWorks  Linkedin engineer . you tube

The Cloudera video

Hadoop ecosystem  map with Hortonworks

what is hadoop – quick answer

Intro HBase

Hadoop 2

Database design with hadoop

Columnur vs Key value

Talk with Jim Goodnight on analytics and big data Datanami Live

Lars George talk about file locality blog

The Hadoop Distributed File System description

The Yahoo Hadoop tutorial

HDFS and Map Reduce name node overview

Big Data for Dummies

The engineering behind Facebook messages moves to Hadoop 

architects zone on

Facebook under the hood article

IBM Hadoop space

Hadoop realtime at Facebook paper

Map Reduce paper

The NoSql ecosystem description description

NoSql data modeling blog

whatisbigdata blog

NoSQL and MySQL at Craigslist

Jeremy Zawodny on Lessons learned  migation NoSQL

NoSQL is what?

Posted in Uncategorized | Leave a comment

EMS data

My first experience with medical data came with the department of Emergency Medicine at University of North Carolina at Chapel Hill. The medical record information system was known as PreMIS , and collected emergency patient care data throughout the state of North Carolina.

Posted in Uncategorized | Leave a comment

learning clinical

Clinical data is data pertaining to actual observation and treatment of patients. :

SDTM Basics :


CDSIC Define XML : Metadata

General classes of observations on subjects I,E,F : Interventions, Events, Findings

General structures : Identifiers, Topic variables, Timing variables, qualifiers

Model : wraps the observations

Domain : series of observations


  • Identifier variables – identify the study, the subject. domain, sequence number
  • topic variables (name of the test)
  • timing variables (start and end date)
  • qualifier variables (numeric units)
  • rule variables (algorithim)

Clinical Trials

SDTM (Study Data Tabulation Model) defines a standard structure for human clinical trial (study) data tabulations that are to be submitted as part of a product application to a regulatory authority such as the United States Food and Drug Administration (FDA). The Submission Data Standards team of Clinical Data Interchange Standards Consortium (CDISC) defines SDTM.

Posted in Uncategorized | Leave a comment

matching in perl

  • match any one of a set of characters , We put the several options in square brackets, select between single options
  • with . (dot) match any single character, . can be taken to match any character whatsoever except a ‘newline’
  • match several characters in the middle, + sign tells Perl to match one or more of the preceding character – one or more of any character  with   .+
  • match zero or more characters with .*
  • ? matches zero or one of the preceding character
  • simple \ (backslash) to indicate that the subsequent character is to be regarded as something to match, and not some fancy control character

modifiers: /test/i

  • i – case insensitivity
  • s – allows match foo on one line and bar on next so that even /./ will match a “newline” character.
  • m – allows the ^ $ to match after a new line and before next newline
  • g keep track of where in string it left off. G means end of previous match

extract information from part of a match –       /alpha(.+)gamma/

  • “xxalphazzzgamma”
  • “alpha beta gamma delta”

what do the (parentheses) achieve? The answer is simple – everything in parenthesis is put into the Perl variable $1. (If you have a second set of parentheses, the contents of this set go into $2, and so on).

\n newline (line feed)
\w a word character [a-zA-Z0-9_]
\W NOT a word character, that is [^a-zA-Z0-9_]
\s white space (new line, carriage return, space, tab, form feed)
\S NOT white space
\d a digit [0-9]
\D NOT a digit, i.e. [^0-9]

  • \b Match a word boundary
  • \B Match a non-(word boundary)
  • \A Match only at beginning of string
  • \Z Match only at end of string, or before newline at the end
  • \z Match only at end of string
  • \G Match only where previous m//g left off (works only with /g)
Posted in Uncategorized | Leave a comment