Monday, December 4, 2017
CAP Theorem (update)
Reading further on the CAP-Theorem, I found this post from Daniel Abadi, where the author explains the limitations on the CAP-Theorem, and proposes a revisited version CAPELC, where Latency is also put along the consistency, availability and partition tolerance.
Thursday, November 16, 2017
CAP Theorem
A useful reminder on the CAP Theorem (aka Brewer theorem), as I was studying a MOOC on HDFS.
What the CAP Theorem says is that on a distributed system, it is impossible to guarantee more than 2 out of the three following constraints:
- Consistency: every read receives the most recent version of the data
- Availability: every read receives an answer, without guarantee that it contains the most recent write.
- Partition tolerance: the system operates even when the network between the nodes fails(except in case of a total breakdown).
The following DBMS can be positioned as follows:
- C+A : Oracle, MySql
- C+P : HDFS, MongoDB
- A+P: Redis, Elasticsearch
Wednesday, July 5, 2017
A full table scan is not always bad, an index is not always good
This is a well known fact: (almost) every developer knows that "sometimes, a full table scan can perform better than using an index".
Lets show this in a little demonstration, which I picked up from Tom Kyte's Expert Oracle Database Architecture. The purpose of the demo is to show that an index access can be more expensive than a full table scan, under some circumstances.
I'm running this demo on a Oracle XE 11gR2 instance, running on a Lenovo X1 Yoga notebook, running Windows 10 (64 bits), with an Intel Core i7-6600U CPU (2.60 GHz), and 16 GB RAM.
Let's first create a table to hold data with a column having values sorted:
CREATE TABLE ORDERED (X NUMBER, Y VARCHAR2(2000), PRIMARY KEY(X));
$EXPLAIN PLAN FOR SELECT * FROM ORDERED WHERE X BETWEEN 10000 AND 30000;
------------------------------------------------------------------------------------------
|Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20002 | 1582K| 323 (0)| 00:00:04 |
| 1 | TABLE ACCESS BY INDEX ROWID| ORDERED | 20002 | 1582K| 323 (0)| 00:00:04 |
|* 2 | INDEX RANGE SCAN | SYS_C007666 | 20002 | | 68 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
EXPLAIN PLAN FOR SELECT * FROM UNORDERED WHERE X BETWEEN 10000 AND 30000;
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20002 | 1582K| 3275 (1)| 00:00:40 |
|* 1 | TABLE ACCESS FULL| UNORDERED | 20002 | 1582K| 3275 (1)| 00:00:40 |
-------------------------------------------------------------------------------
EXPLAIN PLAN FOR SELECT /*+ index( UNORDERED PK_UNORDERED ) */*
FROM UNORDERED WHERE X BETWEEN 10000 AND 30000;
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20002 | 1582K| 20049 (1)| 00:04:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| UNORDERED | 20002 | 1582K| 20049 (1)| 00:04:01 |
|* 2 | INDEX RANGE SCAN | PK_UNORDERED | 20002 | | 44 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Execution plans displays a much higher cost when the index is used on an unordered set of data - in that particular case. This might be unexpected, but it shows how important physical design is. So blindly putting indexes when encountering full table scans is not always a good idea.
Trying to change the optimizer first choice might also not be a good idea. Oddly, there's often in room a developer tempted to prevent the optimizer to to use a full table scan. Not good.
Finally, let's also point out that maintaining an index is also bound to costs.
Lets show this in a little demonstration, which I picked up from Tom Kyte's Expert Oracle Database Architecture. The purpose of the demo is to show that an index access can be more expensive than a full table scan, under some circumstances.
Setup
Let's first create a table to hold data with a column having values sorted:
CREATE TABLE ORDERED (X NUMBER, Y VARCHAR2(2000), PRIMARY KEY(X));
Now populate ORDERED, taking care of sorting the data in the column X from 1 to 1'000'000.
begin
for i in 1 .. 1000000
loop
insert into ordered
values(i, rpad(dbms_random.random, 75, '*'));
end loop;
end;
/
commit;
... then create a table UNORDERED with identical structure, and populate it with the data from the first table, putting the values in the column X in random order:
CREATE TABLE UNORDERED AS SELECT X, Y FROM ORDERED ORDER BY Y;
ALTER TABLE UNORDERED ADD CONSTRAINT PK_UNORDERED PRIMARY KEY (X);
And before starting with the demo, gather the statistics:
begin
dbms_stats.gather_table_stats(USER, 'UNORDERED');
dbms_stats.gather_table_stats(USER, 'ORDERED');
end;
/
Comparing the execution plans
1. Table with ordered data:
SQL Statement:$EXPLAIN PLAN FOR SELECT * FROM ORDERED WHERE X BETWEEN 10000 AND 30000;
------------------------------------------------------------------------------------------
|Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20002 | 1582K| 323 (0)| 00:00:04 |
| 1 | TABLE ACCESS BY INDEX ROWID| ORDERED | 20002 | 1582K| 323 (0)| 00:00:04 |
|* 2 | INDEX RANGE SCAN | SYS_C007666 | 20002 | | 68 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
2. Table with unordered data, full table scan:
SQL Statement:EXPLAIN PLAN FOR SELECT * FROM UNORDERED WHERE X BETWEEN 10000 AND 30000;
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20002 | 1582K| 3275 (1)| 00:00:40 |
|* 1 | TABLE ACCESS FULL| UNORDERED | 20002 | 1582K| 3275 (1)| 00:00:40 |
-------------------------------------------------------------------------------
3. Table with unordered data, using the index:
SQL Statement:EXPLAIN PLAN FOR SELECT /*+ index( UNORDERED PK_UNORDERED ) */*
FROM UNORDERED WHERE X BETWEEN 10000 AND 30000;
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20002 | 1582K| 20049 (1)| 00:04:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| UNORDERED | 20002 | 1582K| 20049 (1)| 00:04:01 |
|* 2 | INDEX RANGE SCAN | PK_UNORDERED | 20002 | | 44 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Conclusion
Execution plans displays a much higher cost when the index is used on an unordered set of data - in that particular case. This might be unexpected, but it shows how important physical design is. So blindly putting indexes when encountering full table scans is not always a good idea.
Trying to change the optimizer first choice might also not be a good idea. Oddly, there's often in room a developer tempted to prevent the optimizer to to use a full table scan. Not good.
Finally, let's also point out that maintaining an index is also bound to costs.
Friday, June 30, 2017
What is the difference between npm install and npm update:
I keep doing stuff that I don't understand.
I was wondering what was the reason we were instructed to always run npm install an npm update after pulling from the repository. Couldn't a single command do it?
So, here's a short note on the difference between npm install and npm update:
I was wondering what was the reason we were instructed to always run npm install an npm update after pulling from the repository. Couldn't a single command do it?
So, here's a short note on the difference between npm install and npm update:
Sample package.json file:
{
"name": "my-project",
"version": "1.0", // install update
"dependencies": { // ------------------
"already-installed-versionless-module": "*", // ignores "1.0" -> "1.1"
"already-installed-semver-module": "^1.4.3" // ignores "1.4.3" -> "1.5.2"
"already-installed-versioned-module": "3.4.1" // ignores ignores
"not-yet-installed-versionless-module": "*", // installs installs
"not-yet-installed-semver-module": "^4.2.1" // installs installs
"not-yet-installed-versioned-module": "2.7.8" // installs installs
}
}
Conclusion: The only big difference is that an already installed module with fuzzy versioning ...
- gets ignored by npm install
- gets updated by npm update
Thursday, June 29, 2017
PL/SQL: Simple cursors for example
Here a very basic demonstration of a PL/SQL program that updates data from a table. It uses an explicit cursor for loop:
The program is based on the EMPLOYEES table in HR Schema.
DECLARE
CURSOR C1 IS
SELECT * FROM EMPLOYEES WHERE DEPARTMENT_ID = 60
FOR UPDATE;
BEGIN
SAVEPOINT TX_START;
FOR RC IN C1 LOOP
DBMS_OUTPUT.PUT_LINE('Fetching record ' || RC.LAST_NAME);
IF RC.SALARY > 5000 THEN
UPDATE EMPLOYEES SET SALARY = SALARY*1.1 WHERE CURRENT OF C1;
END IF;
END LOOP;
EXCEPTION
WHEN OTHERS THEN
ROLLBACK TO TX_START;
DBMS_OUTPUT.PUT_LINE(SQLERRM);
RAISE;
END;
/
The program is based on the EMPLOYEES table in HR Schema.
DECLARE
CURSOR C1 IS
SELECT * FROM EMPLOYEES WHERE DEPARTMENT_ID = 60
FOR UPDATE;
BEGIN
SAVEPOINT TX_START;
FOR RC IN C1 LOOP
DBMS_OUTPUT.PUT_LINE('Fetching record ' || RC.LAST_NAME);
IF RC.SALARY > 5000 THEN
UPDATE EMPLOYEES SET SALARY = SALARY*1.1 WHERE CURRENT OF C1;
END IF;
END LOOP;
EXCEPTION
WHEN OTHERS THEN
ROLLBACK TO TX_START;
DBMS_OUTPUT.PUT_LINE(SQLERRM);
RAISE;
END;
/
Thursday, May 11, 2017
How to use XStream to generate xml from objects
I have to instrumentalize my code to generate tests. For that purpose, I use serialization to capture my objects state from the server while running some system tests. And I use the generated files for my IT-Tests using deserialization.
My colleague Fabrizio provided me with a neat utility that generates json files from all my serializable classes.
But I was confronted with difficulties with the complexity of some object graphs, so I use Xstream to generat xml.
Here's how to dump an object to xml:
XStream xstream = new XStream();
String xml = xstream.toXML(myObject);
String filePath = "output.xml";
try{
new File(filePath).createNewFile();
try (Writer writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(filePath), "utf-8"))) {
writer.write(xml);
writer.flush();
}
} catch (IOException e){
e.printStackTrace();
}
My colleague Fabrizio provided me with a neat utility that generates json files from all my serializable classes.
But I was confronted with difficulties with the complexity of some object graphs, so I use Xstream to generat xml.
Here's how to dump an object to xml:
XStream xstream = new XStream();
String xml = xstream.toXML(myObject);
String filePath = "output.xml";
try{
new File(filePath).createNewFile();
try (Writer writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(filePath), "utf-8"))) {
writer.write(xml);
writer.flush();
}
} catch (IOException e){
e.printStackTrace();
}
Wednesday, March 22, 2017
How to manipulate dates in SQL (demo)
This is a reminder on how to manipulate dates in SQL (Oracle).
create table date_test(id number, date1 date, ts1 timestamp, date2 date, ts2 timestamp);
insert into date_test values (1,
to_date('2014-06-27', 'RRRR-MM-DD'),
to_date('2014-06-27', 'RRRR-MM-DD'),
to_date('2014-06-27', 'RRRR-MM-DD'),
to_date('2014-06-27', 'RRRR-MM-DD'));
SELECT CURRENT_TIMESTAMP, CURRENT_TIMESTAMP + INTERVAL '1' SECOND(1)
FROM DUAL;
SELECT TO_CHAR (TS1, 'RRRR-MM-DD, HH24:MI:SS') AS ORI,
TO_CHAR (TS1-INTERVAL '1' SECOND(1),'RRRR-MM-DD, HH24:MI:SS') AS FORMATTED
FROM DATE_TEST;
Setup:
I'll create a simple table a populate it with a few data:create table date_test(id number, date1 date, ts1 timestamp, date2 date, ts2 timestamp);
insert into date_test values (1,
to_date('2014-06-27', 'RRRR-MM-DD'),
to_date('2014-06-27', 'RRRR-MM-DD'),
to_date('2014-06-27', 'RRRR-MM-DD'),
to_date('2014-06-27', 'RRRR-MM-DD'));
Data selection examples:
SELECT CURRENT_TIMESTAMP FROM DUAL;SELECT CURRENT_TIMESTAMP, CURRENT_TIMESTAMP + INTERVAL '1' SECOND(1)
FROM DUAL;
SELECT TO_CHAR (TS1, 'RRRR-MM-DD, HH24:MI:SS') AS ORI,
TO_CHAR (TS1-INTERVAL '1' SECOND(1),'RRRR-MM-DD, HH24:MI:SS') AS FORMATTED
FROM DATE_TEST;
See also this page.
How to use the CASE clause in a SQL statement (a demo)
This is a little demo to show how to use the CASE clause in a SQL statement:
drop table two_fields;
drop table two_fields2;
create table two_fields (id number, text varchar2(200), field1 varchar2(10), field2 varchar2(10));
create table two_fields2 (id number, text varchar2(200), field12 varchar2(10));
insert into two_fields values (1, 'text1', 'f1-1', null);
insert into two_fields values (2, 'text2', 'f1-2', null);
insert into two_fields values (3, 'text3', 'f1-3', null);
insert into two_fields values (4, 'text4', null, 'f2-1');
insert into two_fields values (5, 'text5', null, 'f2-2');
insert into two_fields values (6, 'text6', null, 'f2-3');
commit;
select * from two_fields;
select * from two_fields2;
select id, text, field1, field2, case field1 when null then 'field2' else 'field1' end as f1f2 from two_fields;
select id, text, field1, field2, decode (field1 , null, field2, field1) f1t from two_fields;
INSERT INTO TWO_FIELDS2 (SELECT ID, TEXT, decode (field1 , null, field2, field1) from two_fields);
ALTER TABLE two_fields ADD CONSTRAINT c1 check (field1 is not null or field2 is not null);
ALTER TABLE two_fields ADD CONSTRAINT c2 check ((field1 is not null and field2 is null) OR (field1 is null and field2 is not null))
drop table two_fields;
drop table two_fields2;
create table two_fields (id number, text varchar2(200), field1 varchar2(10), field2 varchar2(10));
create table two_fields2 (id number, text varchar2(200), field12 varchar2(10));
insert into two_fields values (1, 'text1', 'f1-1', null);
insert into two_fields values (2, 'text2', 'f1-2', null);
insert into two_fields values (3, 'text3', 'f1-3', null);
insert into two_fields values (4, 'text4', null, 'f2-1');
insert into two_fields values (5, 'text5', null, 'f2-2');
insert into two_fields values (6, 'text6', null, 'f2-3');
commit;
select * from two_fields;
select * from two_fields2;
select id, text, field1, field2, case field1 when null then 'field2' else 'field1' end as f1f2 from two_fields;
select id, text, field1, field2, decode (field1 , null, field2, field1) f1t from two_fields;
INSERT INTO TWO_FIELDS2 (SELECT ID, TEXT, decode (field1 , null, field2, field1) from two_fields);
ALTER TABLE two_fields ADD CONSTRAINT c1 check (field1 is not null or field2 is not null);
ALTER TABLE two_fields ADD CONSTRAINT c2 check ((field1 is not null and field2 is null) OR (field1 is null and field2 is not null))
Find out the columns used in a referencial integrity constraints
Using the data dictionary views USER_CONSTRAINTS and USER_CONS_COLUMNS, it is possible to find out what columns are used in a referencial integrity constraint:
SELECT
UCC1.TABLE_NAME,
UCC1.COLUMN_NAME,
UCC1.CONSTRAINT_NAME,
UC.R_CONSTRAINT_NAME,
UCC2.TABLE_NAME
FROM USER_CONSTRAINTS UC ,
USER_CONS_COLUMNS UCC1,
USER_CONS_COLUMNS UCC2
WHERE UC.CONSTRAINT_TYPE = 'R'
AND UCC1.CONSTRAINT_NAME = UC.CONSTRAINT_NAME
AND UCC2.CONSTRAINT_NAME = UC.R_CONSTRAINT_NAME
SELECT
UCC1.TABLE_NAME,
UCC1.COLUMN_NAME,
UCC1.CONSTRAINT_NAME,
UC.R_CONSTRAINT_NAME,
UCC2.TABLE_NAME
FROM USER_CONSTRAINTS UC ,
USER_CONS_COLUMNS UCC1,
USER_CONS_COLUMNS UCC2
WHERE UC.CONSTRAINT_TYPE = 'R'
AND UCC1.CONSTRAINT_NAME = UC.CONSTRAINT_NAME
AND UCC2.CONSTRAINT_NAME = UC.R_CONSTRAINT_NAME
How to find out which roles does a user have
I often have to find out whether a particular user has a certain role.
This is the command to read the AD - Roles of a user:
U:\>net user z100079 /domain
The request will be processed at a domain controller for domain mobi.mobicorp.ch.
User name Z100079
Full Name Z100079
Comment Service Account für "Vertragsverlauf", Gysling Eva
User's comment
Country/region code 000 (System Default)
Account active Yes
Account expires 30.12.2099 23:00:00
Password last set 27.01.2011 13:30:38
Password expires Never
Password changeable 27.01.2011 13:30:38
Password required Yes
User may change password Yes
Workstations allowed All
Logon script
User profile
Home directory
Last logon Never
Logon hours allowed All
Local Group Memberships
Global Group memberships *AMPS_WS *Domain Users
*ASES-INTERN *AISDV_CONSUMER
*ACONTRACTREADCLIENT *ASES-EXTERN
*ALOSSRATIOREADCLIENT *ACLAIMREADCLIENT
This is the command to read the AD - Roles of a user:
U:\>net user z100079 /domain
The request will be processed at a domain controller for domain mobi.mobicorp.ch.
User name Z100079
Full Name Z100079
Comment Service Account für "Vertragsverlauf", Gysling Eva
User's comment
Country/region code 000 (System Default)
Account active Yes
Account expires 30.12.2099 23:00:00
Password last set 27.01.2011 13:30:38
Password expires Never
Password changeable 27.01.2011 13:30:38
Password required Yes
User may change password Yes
Workstations allowed All
Logon script
User profile
Home directory
Last logon Never
Logon hours allowed All
Local Group Memberships
Global Group memberships *AMPS_WS *Domain Users
*ASES-INTERN *AISDV_CONSUMER
*ACONTRACTREADCLIENT *ASES-EXTERN
*ALOSSRATIOREADCLIENT *ACLAIMREADCLIENT
Sunday, January 15, 2017
Angular JS and URLs
While trying to run a demo application from a tutorial from OpenClassroom, I was confronted with the problem that all my URLs were rewritten with a trailing #. This ruined my navigation.
From a few contributions in Stackoverflow (especially this one), I found a solution:
1. Configuring $locationProvider
- This causes the browser running in html5mode, removing the need for adding # to the url
2. Setting the base for relative links
From a few contributions in Stackoverflow (especially this one), I found a solution:
1. Configuring $locationProvider
- This causes the browser running in html5mode, removing the need for adding # to the url
2. Setting the base for relative links
1. Configuring $locationProvider
in the file app.js
angular.module('cineAngularApp', ['ngRoute'])
.config(function ($routeProvider, $locationProvider) {
$routeProvider
.when('/', {
...
});
$locationProvider.html5Mode(true); });
2. Setting the base for relative links
in index.html:<html> <base href="/"> ... </html>Afterwards, I still had the problem that all urls were rewritten with a %2F. This is related to the fact that AngularJS 1.6 has changed the default for hash-bang urls in the$locationservice.Just have to configure $locationProvider to change the behavior (see this post): $locationProvider.hashPrefix('');
Subscribe to:
Comments (Atom)