python


Manipulate df with multiple time-series into array (with missing dates filled)


I have a relatively large df (10^6 records) structured as such:
Date,SN,Zip Code,A,B,Total,Lat,Lon
2015-09-01,10948.0,80015,0,0,1,39.626999999999995,-104.779
2015-09-01,11906.0,85392,0,0,1,33.478,-112.309
2015-09-03,10948.0,85260,0,0,1,33.611,-111.891
2015-09-03,11906.0,85050,0,0,1,33.683,-111.99799999999999
2015-09-05,12111.0,23834,0,0,1,37.291,-77.404
2015-09-05,11906.0,72761,0,0,1,36.169000000000004,-94.455
Notice that each SN (unique identifier) has at most 1 record per day. On some days, some SN have no record which means that the Total was 0 for the day. I want to take this df and convert into a numpy array that will show the Total for each day (rows) and SN (columns), but fill in the days that are missing for a SN with a 0.
You need pivot:
df.pivot('Date', 'SN', 'Total').fillna(0)
#SN 10948.0 11906.0 12111.0
#Date
#2015-09-01 1.0 1.0 0.0
#2015-09-03 1.0 1.0 0.0
#2015-09-05 0.0 1.0 1.0
To get the numpy array:
df.pivot('Date', 'SN', 'Total').fillna(0).values
#array([[ 1., 1., 0.],
# [ 1., 1., 0.],
# [ 0., 1., 1.]])
Update to get all dates, you can use reindex:
# convert Date column to datetime
df['Date'] = pd.to_datetime(df.Date)
​
# pivot to wide format
df1 = df.pivot('Date', 'SN', 'Total').fillna(0)
​
# reindex to get all dates
df1.reindex(pd.date_range(df1.index.min(), df1.index.max())).fillna(0)
# SN 10948.0 11906.0 12111.0
#2015-09-01 1.0 1.0 0.0
#2015-09-02 0.0 0.0 0.0
#2015-09-03 1.0 1.0 0.0
#2015-09-04 0.0 0.0 0.0
#2015-09-05 0.0 1.0 1.0

Related Links

matplotlib multiple charts. wrong or bad apperance
how to use Numba target='parallel' to check if a 2D array exists in a list which comprises of multiple 2D arrays
python csv fieldnames error
OpenCV : Vehicle axle detection
How to convert HTML to text keeping underline tags (<u></u>) using html2text
How to override settings.py with settings_local.py in Django 1.11
Format the color of a cell in a panda dataframe according to multiple conditions
Bokeh server callback initiated from Flask application
Python Variable Amount Of Input
Turn pandas dataframe list into boolean column
How to handle concatenate with empty matrix
python django translation .po and .mo file not translating the files
jupyter not using version set by pyenv
Generalize print+format for a variable number of inputs
What are the differences in these two codes? [closed]
What is the correct way to change image channel ordering between channels first and channels last?

Categories

HOME
swift
visual-studio-2015
cil
meshlab
bower
wxwidgets
visualization
command
iis-7.5
slim-3
ssr
scala-native
aptana
derived
jpeg2000
linq-to-sql
boolean-expression
ip-camera
pycrypto
tweets
uiautomator
go-cd
scalajs-react
interrupt-handling
maximo
version-numbering
aurelia-binding
modulo
nlb
npm-publish
duktape
brute-force
cgo
tripwire
encapsulation
chunked-encoding
crash-reports
volume
photography
execl
jcreator
windows-azure-pack
software-product-lines
tuxedo
instruments
scriptlet
acm
insert-into
app.xaml
risk-analysis
manjaro
pylearn
xaml-designer
stackframe
chaining
start-job
lexicographic
cloudpebble
libsndfile
hidden-field
author
korma
visual-studio-addins
arangodb-php
exiv2
facebook-chat
java.util.date
adomd.net
database-permissions
transitive-closure-table
marmalade-edk
mvs
xmlslurper
getstring
associative
adobe-contribute
pivotal-crm
squeel
pureftpd
carbide
querypath
external-assemblies
grooveshark
longjmp
economics
document-library

Resources

Encrypt Message



code
soft
python
ios
c
html
jquery
cloud
mobile